Your VP of Engineering asks: "What's our SLA?" Your SRE says: "Our SLO is 99.9%." Your product manager asks: "What SLI are we tracking?" Everyone nods. Nobody is talking about the same thing.
SLA, SLO, and SLI are three related but different concepts. Confusing them leads to either overpromising to customers (bad SLAs) or undermonitoring your services (no SLOs). Here's the clear breakdown.
The One-Sentence Definitions
| Term | What it is | Who cares |
|---|---|---|
| SLI (Service Level Indicator) | The metric you measure | Engineers |
| SLO (Service Level Objective) | The target you aim for | Engineering + Product |
| SLA (Service Level Agreement) | The contract you sign | Business + Legal + Customers |
They stack: you measure an SLI, set an SLO as your internal target, and promise an SLA to customers (usually lower than your SLO, as a safety margin).
SLI: The Metric You Measure
An SLI is a quantitative measurement of some aspect of your service. For APIs, common SLIs are:
- Availability: Percentage of requests that return a non-5xx response
- Latency: Percentage of requests that complete within a threshold (e.g., P95 < 500ms)
- Error rate: Percentage of requests that return errors
- Throughput: Requests per second the service handles
// SLI examples for a Next.js API:
//
// Availability SLI:
// good_requests / total_requests = 99.95%
// (where good = non-5xx responses)
//
// Latency SLI:
// requests_under_500ms / total_requests = 97.2%
// (P95 latency: 340ms)
//
// Error rate SLI:
// 5xx_responses / total_responses = 0.05%Key point: An SLI is always a ratio or percentage. "Response time is 200ms" is a metric. "95% of requests are under 200ms" is an SLI.
SLO: The Target You Aim For
An SLO is an internal target for an SLI. It says: "We aim for our availability SLI to be at least 99.9%."
| SLI | SLO target | What it means |
|---|---|---|
| Availability | 99.9% | Max 43 min downtime/month |
| Latency (P95) | < 500ms | 95% of requests under half a second |
| Error rate | < 0.1% | Max 1 in 1,000 requests fails |
SLOs are internal. You don't show them to customers. They're engineering targets that drive decisions: "Should we deploy this risky migration? Check the error budget."
The Error Budget
If your SLO is 99.9% availability, your error budget is 0.1% — about 43 minutes of downtime per month.
// Error budget math:
// SLO: 99.9% availability
// Error budget: 100% - 99.9% = 0.1%
// Monthly minutes: 30 days × 24 hours × 60 min = 43,200 minutes
// Allowed downtime: 43,200 × 0.001 = 43.2 minutes/month
//
// Budget spent this month:
// - Deploy rollback (3 min): 7% of budget
// - DB migration downtime (8 min): 18% of budget
// - Unplanned outage (15 min): 35% of budget
// ─────────────────────────────────────────────
// Total spent: 60% — still 40% remainingWhen your error budget is nearly spent, you slow down deployments and focus on stability. When you have budget left, you can take risks (new features, migrations, experiments).
SLA: The Contract You Sign
An SLA is a legal agreement between you and your customers. It specifies what happens when you fail to meet a service level — usually financial penalties (credits).
- SLA is always lower than SLO. If your SLO is 99.9%, your SLA might promise 99.5%. This gives you a safety margin.
- SLAs have consequences. "If availability drops below 99.5% in a calendar month, customer receives a 10% service credit."
- Not every service needs an SLA. Internal tools, beta products, and free tiers often don't have SLAs.
// The stack in practice:
//
// SLI (measured): 99.95% availability this month
// SLO (target): 99.9% → ✅ Meeting target
// SLA (promised): 99.5% → ✅ Well above promise
//
// vs a bad month:
// SLI (measured): 99.7% availability
// SLO (target): 99.9% → ❌ Below target (burn error budget)
// SLA (promised): 99.5% → ✅ Still above SLA (no credits owed)Real-World Examples by Company Size
Startup (5 engineers, 1 API)
- SLI: Availability (non-5xx responses), P95 latency
- SLO: 99.9% availability, P95 < 1 second
- SLA: None (beta product, free tier)
SaaS company (20 engineers, paid customers)
- SLI: Availability, P95 latency, error rate per endpoint
- SLO: 99.95% availability, P95 < 500ms, error rate < 0.05%
- SLA: 99.9% availability with 10% credit below, 25% credit below 99.5%
Infrastructure API (Stripe, AWS)
- SLI: Per-region availability, P99 latency, error rate
- SLO: 99.999% availability (internal, not published)
- SLA: 99.99% with tiered credits (10%, 25%, 100%)
How to Monitor Your SLOs
An SLO without monitoring is just a wish. You need to track your SLIs in real-time and alert when you're burning through your error budget.
Nurbak Watch tracks the SLIs that matter for API teams — availability, P50/P95/P99 latency, and error rates per endpoint — from inside your Next.js server:
// instrumentation.ts
import { initWatch } from '@nurbak/watch'
export function register() {
initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
})
}Every API route monitored automatically. When your availability SLI drops below your SLO target, you get a Slack/WhatsApp alert in under 10 seconds. Free during beta.
Use our uptime calculator to convert your SLO percentage into real downtime numbers.

