Your users see this:

    HTTP/1.1 504 Gateway Timeout
Content-Type: application/json

{"message": "Endpoint request timed out"}

Your API gateway waited for your backend to respond. It waited. And waited. Then it gave up.

A 504 Gateway Timeout doesn't mean the gateway is broken. It means your backend is too slow — or unreachable. The gateway just gave up waiting. This guide covers why it happens, how to fix it per platform, and how to detect timeouts before your users report them.

What Causes API Gateway Timeouts

The gateway sends the request to your backend, starts a timer, and waits. If the timer runs out before the backend responds, it returns 504 to the client. Six things cause this:

1. Slow database queries

The most common cause. A query that normally takes 200ms suddenly takes 30 seconds because:

  • A missing index on a table that grew from 10K to 1M rows
  • A full table lock from a migration running during peak traffic
  • Connection pool exhaustion — all connections are busy, new queries wait in queue

2. External API calls that hang

Your endpoint calls Stripe, Auth0, or a third-party API. That API is slow or down. Your code waits for it, the gateway timer ticks, and eventually: 504.

    // This code has no timeout — it will wait forever for Stripe
const charge = await stripe.charges.create({
  amount: 2000,
  currency: 'usd',
})

// This code has a timeout — fails fast instead of timing out the gateway
const charge = await stripe.charges.create({
  amount: 2000,
  currency: 'usd',
}, {
  timeout: 10000, // 10 second max
})

3. Cold starts (serverless)

On Vercel or Lambda, the first request after inactivity initializes the function. If initialization takes 2 seconds and the gateway timeout is 3 seconds, you have 1 second for actual processing. Add a slow DB connection and you're past the limit.

4. Backend is unreachable

The backend server is down, DNS isn't resolving, or a security group/firewall rule blocks the connection. The gateway tries to connect, can't, and eventually times out.

5. Response too large

The backend generates a response (e.g., a large JSON array) that takes too long to serialize and transmit back through the gateway.

6. Infinite loops or deadlocks

A code bug causes the request to never complete. The gateway timeout is the safety net that prevents the client from waiting forever.

Timeout Limits by Platform

PlatformDefault timeoutMax timeoutConfigurable?
AWS API Gateway (REST)29 seconds29 seconds (hard limit)Can lower, cannot raise above 29s
AWS API Gateway (HTTP)30 seconds30 seconds (hard limit)Can lower, cannot raise above 30s
Kong60 secondsNo limitYes, per service/route
Nginx60 secondsNo limitYes (proxy_read_timeout)
Cloudflare100 seconds600 seconds (Enterprise)Yes, per zone/rule
Vercel (serverless)10 seconds (Hobby)300 seconds (Enterprise)Per plan tier

Critical: AWS API Gateway's 29-second limit is a hard platform constraint. You cannot increase it. If your endpoints need more than 29 seconds, you must switch to an asynchronous pattern.

How to Fix Timeouts

Step 1: Identify which endpoint times out

Enable access logging on your gateway to see which routes return 504:

    # AWS API Gateway — enable access logging
aws apigateway update-stage \
  --rest-api-id YOUR_API_ID \
  --stage-name prod \
  --patch-operations \
    op=replace,path=/accessLogSettings/destinationArn,value=arn:aws:logs:us-east-1:123456789:log-group:api-access-logs
    # Kong — check error logs
tail -f /usr/local/kong/logs/error.log | grep "upstream timed out"

# Nginx — check for 504s in access log
grep " 504 " /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn

Step 2: Check if the backend is reachable

    # Test connectivity from gateway to backend
curl -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" \
  -o /dev/null -s https://your-backend.com/api/health

If DNS is slow (>500ms) or connect fails, the problem is network/infrastructure, not your code.

Step 3: Profile the slow endpoint

Add timing to identify which part of your code is slow:

    // Next.js API route with timing
export async function GET(request: Request) {
  const timings: Record<string, number> = {}

  let start = Date.now()
  const dbResult = await db.query('SELECT ...')
  timings.database = Date.now() - start

  start = Date.now()
  const stripeData = await stripe.charges.list({ limit: 10 })
  timings.stripe = Date.now() - start

  start = Date.now()
  const response = transform(dbResult, stripeData)
  timings.transform = Date.now() - start

  console.log('Timings:', timings)
  // { database: 150, stripe: 8500, transform: 5 }
  // → Stripe is the bottleneck

  return Response.json(response)
}

Step 4: Fix the root cause

Root causeFix
Slow DB queryAdd indexes, optimize query, implement caching
External API slowAdd timeout (10s), implement circuit breaker, cache responses
Cold startsUse provisioned concurrency (Lambda), keep-warm pings, or lighter runtime
Connection pool exhaustionIncrease pool size, add connection timeout, fix connection leaks
Response too largePaginate responses, use streaming, compress with gzip
Inherently slow operationSwitch to async: return 202, process in background, poll for results

Step 5: Adjust gateway timeout (if appropriate)

    # Kong — increase timeout for a specific service
curl -X PATCH http://localhost:8001/services/slow-service \
  --data "read_timeout=120000" \
  --data "write_timeout=120000" \
  --data "connect_timeout=10000"
    # Nginx — increase proxy timeout
location /api/reports {
    proxy_pass http://backend;
    proxy_read_timeout 120s;  # 2 minutes for report generation
    proxy_connect_timeout 10s;
}

Warning: Increasing timeouts is a band-aid. It hides the real problem (slow backend) and shifts the burden to clients who wait longer. Fix the root cause first.

The Async Pattern: When Timeouts Are Unavoidable

Some operations genuinely take minutes: PDF generation, data exports, ML inference. For these, use the async request pattern:

    // POST /api/reports — starts the job, returns immediately
export async function POST(request: Request) {
  const jobId = crypto.randomUUID()

  // Queue the work (e.g., to a message queue or database)
  await queue.send({ jobId, params: await request.json() })

  // Return immediately — gateway timeout is irrelevant
  return Response.json(
    { jobId, status: 'processing', pollUrl: `/api/reports/${jobId}` },
    { status: 202 } // 202 Accepted
  )
}

// GET /api/reports/:jobId — check status
export async function GET(request: Request, { params }) {
  const job = await db.getJob(params.jobId)

  if (job.status === 'completed') {
    return Response.json({ status: 'completed', result: job.result })
  }

  return Response.json({
    status: job.status, // 'processing' | 'failed'
    retryAfter: 5, // seconds
  })
}

How to Monitor for Timeouts Before Users Report Them

Gateway timeouts are a symptom of slow backends. If you monitor latency proactively, you catch the slowdown before it becomes a timeout.

Nurbak Watch monitors every API route from inside your Next.js server and alerts you when latency spikes — before it crosses the gateway timeout threshold:

    // instrumentation.ts
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
    apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

When /api/reports response time climbs from 200ms to 15 seconds, you get a Slack/WhatsApp alert in under 10 seconds — while there's still time to fix it before the gateway timeout at 29 seconds kills the request.

Free during beta. No agents, no cold start overhead. Five lines of code.

Related Articles