Your SLA says 99.99% uptime. Your client nods. Everyone feels good about the number. But does anyone actually know what it means?

99.99% sounds like "basically never down." In reality, it means you can be down for 52 minutes per year. That's less than one hour — total — across 365 days. One bad deployment that takes 20 minutes to roll back uses 38% of your annual downtime budget.

The Nines Table

UptimeNameDowntime/yearDowntime/monthDowntime/day
99%Two nines3 days, 15 hours7 hours, 18 min14 min, 24s
99.5%1 day, 19 hours3 hours, 39 min7 min, 12s
99.9%Three nines8 hours, 45 min43 min, 49s1 min, 26s
99.95%4 hours, 22 min21 min, 54s43s
99.99%Four nines52 min, 33s4 min, 23s8.6s
99.999%Five nines5 min, 15s26s0.86s

Use our free uptime calculator to convert any percentage to real downtime numbers.

What Each Level Requires

99% — Internal tools, staging environments

3.6 days of downtime per year. Achievable with a single server and basic monitoring. Most side projects and internal tools operate here.

99.9% — Most SaaS products

8 hours, 45 minutes per year. Requires health checks, automated restarts, and alerting. This is the minimum acceptable level for production APIs that customers depend on.

99.99% — Payment APIs, auth systems

52 minutes per year. Requires redundancy (multi-AZ or multi-region), automated failover, blue-green deployments, and sub-minute monitoring. A single 20-minute outage uses 38% of your annual budget.

99.999% — Infrastructure APIs (AWS, Stripe)

5 minutes per year. Requires active-active multi-region, automatic traffic rerouting, zero-downtime deployments, and a dedicated SRE team. Most companies don't need this and shouldn't promise it.

How to Calculate Your Real Uptime

Most teams calculate uptime from synthetic checks: "Our health check returned 200 for 99.9% of pings." But a health check every 60 seconds can miss a 59-second outage entirely.

Real uptime should be calculated from actual request data:

    // Request-based uptime (more accurate)
const totalRequests = 1_000_000
const failedRequests = 500   // 5xx responses
const uptime = ((totalRequests - failedRequests) / totalRequests) * 100
// 99.95% — more accurate than any synthetic check

// Synthetic-based uptime (less accurate)
const totalChecks = 43_200  // 1 check/min for 30 days
const failedChecks = 12      // 12 minutes of downtime detected
const syntheticUptime = ((totalChecks - failedChecks) / totalChecks) * 100
// 99.97% — but might have missed short outages between checks

The Error Budget Concept

If your SLA is 99.9%, you have an error budget of 0.1% — approximately 43 minutes of downtime per month. Think of it as a budget you can "spend":

  • Deployments that cause 2 minutes of downtime? That's fine — 41 minutes left.
  • Database migration that takes the API down for 10 minutes? Budget it — 31 minutes left.
  • Unplanned outage of 30 minutes? You've used 70% of your monthly budget in one incident.

When the error budget is almost spent, slow down deployments and focus on reliability.

How to Monitor Uptime

Two approaches, ideally used together:

External checks (uptime pings)

UptimeRobot, Better Stack, or Pingdom ping your API from outside every 30-60 seconds. This catches total outages and DNS/network issues. But it misses partial failures, endpoint-specific errors, and issues between check intervals.

Internal monitoring (request-based)

Nurbak Watch runs inside your Next.js server and calculates uptime from real request data. Every request counts — not just synthetic pings. If /api/checkout returns 500 for 5% of requests while /api/health returns 200, internal monitoring catches it. External pings don't.

    // instrumentation.ts
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
    apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

Free during beta. Request-based uptime. Alerts in under 10 seconds. Calculate your target uptime and start monitoring it.

Related Articles