Your users see this:
HTTP/1.1 504 Gateway Timeout
Content-Type: application/json
{"message": "Endpoint request timed out"}Your API gateway waited for your backend to respond. It waited. And waited. Then it gave up.
A 504 Gateway Timeout doesn't mean the gateway is broken. It means your backend is too slow — or unreachable. The gateway just gave up waiting. This guide covers why it happens, how to fix it per platform, and how to detect timeouts before your users report them.
What Causes API Gateway Timeouts
The gateway sends the request to your backend, starts a timer, and waits. If the timer runs out before the backend responds, it returns 504 to the client. Six things cause this:
1. Slow database queries
The most common cause. A query that normally takes 200ms suddenly takes 30 seconds because:
- A missing index on a table that grew from 10K to 1M rows
- A full table lock from a migration running during peak traffic
- Connection pool exhaustion — all connections are busy, new queries wait in queue
2. External API calls that hang
Your endpoint calls Stripe, Auth0, or a third-party API. That API is slow or down. Your code waits for it, the gateway timer ticks, and eventually: 504.
// This code has no timeout — it will wait forever for Stripe
const charge = await stripe.charges.create({
amount: 2000,
currency: 'usd',
})
// This code has a timeout — fails fast instead of timing out the gateway
const charge = await stripe.charges.create({
amount: 2000,
currency: 'usd',
}, {
timeout: 10000, // 10 second max
})3. Cold starts (serverless)
On Vercel or Lambda, the first request after inactivity initializes the function. If initialization takes 2 seconds and the gateway timeout is 3 seconds, you have 1 second for actual processing. Add a slow DB connection and you're past the limit.
4. Backend is unreachable
The backend server is down, DNS isn't resolving, or a security group/firewall rule blocks the connection. The gateway tries to connect, can't, and eventually times out.
5. Response too large
The backend generates a response (e.g., a large JSON array) that takes too long to serialize and transmit back through the gateway.
6. Infinite loops or deadlocks
A code bug causes the request to never complete. The gateway timeout is the safety net that prevents the client from waiting forever.
Timeout Limits by Platform
| Platform | Default timeout | Max timeout | Configurable? |
|---|---|---|---|
| AWS API Gateway (REST) | 29 seconds | 29 seconds (hard limit) | Can lower, cannot raise above 29s |
| AWS API Gateway (HTTP) | 30 seconds | 30 seconds (hard limit) | Can lower, cannot raise above 30s |
| Kong | 60 seconds | No limit | Yes, per service/route |
| Nginx | 60 seconds | No limit | Yes (proxy_read_timeout) |
| Cloudflare | 100 seconds | 600 seconds (Enterprise) | Yes, per zone/rule |
| Vercel (serverless) | 10 seconds (Hobby) | 300 seconds (Enterprise) | Per plan tier |
Critical: AWS API Gateway's 29-second limit is a hard platform constraint. You cannot increase it. If your endpoints need more than 29 seconds, you must switch to an asynchronous pattern.
How to Fix Timeouts
Step 1: Identify which endpoint times out
Enable access logging on your gateway to see which routes return 504:
# AWS API Gateway — enable access logging
aws apigateway update-stage \
--rest-api-id YOUR_API_ID \
--stage-name prod \
--patch-operations \
op=replace,path=/accessLogSettings/destinationArn,value=arn:aws:logs:us-east-1:123456789:log-group:api-access-logs # Kong — check error logs
tail -f /usr/local/kong/logs/error.log | grep "upstream timed out"
# Nginx — check for 504s in access log
grep " 504 " /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rnStep 2: Check if the backend is reachable
# Test connectivity from gateway to backend
curl -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" \
-o /dev/null -s https://your-backend.com/api/healthIf DNS is slow (>500ms) or connect fails, the problem is network/infrastructure, not your code.
Step 3: Profile the slow endpoint
Add timing to identify which part of your code is slow:
// Next.js API route with timing
export async function GET(request: Request) {
const timings: Record<string, number> = {}
let start = Date.now()
const dbResult = await db.query('SELECT ...')
timings.database = Date.now() - start
start = Date.now()
const stripeData = await stripe.charges.list({ limit: 10 })
timings.stripe = Date.now() - start
start = Date.now()
const response = transform(dbResult, stripeData)
timings.transform = Date.now() - start
console.log('Timings:', timings)
// { database: 150, stripe: 8500, transform: 5 }
// → Stripe is the bottleneck
return Response.json(response)
}Step 4: Fix the root cause
| Root cause | Fix |
|---|---|
| Slow DB query | Add indexes, optimize query, implement caching |
| External API slow | Add timeout (10s), implement circuit breaker, cache responses |
| Cold starts | Use provisioned concurrency (Lambda), keep-warm pings, or lighter runtime |
| Connection pool exhaustion | Increase pool size, add connection timeout, fix connection leaks |
| Response too large | Paginate responses, use streaming, compress with gzip |
| Inherently slow operation | Switch to async: return 202, process in background, poll for results |
Step 5: Adjust gateway timeout (if appropriate)
# Kong — increase timeout for a specific service
curl -X PATCH http://localhost:8001/services/slow-service \
--data "read_timeout=120000" \
--data "write_timeout=120000" \
--data "connect_timeout=10000" # Nginx — increase proxy timeout
location /api/reports {
proxy_pass http://backend;
proxy_read_timeout 120s; # 2 minutes for report generation
proxy_connect_timeout 10s;
}Warning: Increasing timeouts is a band-aid. It hides the real problem (slow backend) and shifts the burden to clients who wait longer. Fix the root cause first.
The Async Pattern: When Timeouts Are Unavoidable
Some operations genuinely take minutes: PDF generation, data exports, ML inference. For these, use the async request pattern:
// POST /api/reports — starts the job, returns immediately
export async function POST(request: Request) {
const jobId = crypto.randomUUID()
// Queue the work (e.g., to a message queue or database)
await queue.send({ jobId, params: await request.json() })
// Return immediately — gateway timeout is irrelevant
return Response.json(
{ jobId, status: 'processing', pollUrl: `/api/reports/${jobId}` },
{ status: 202 } // 202 Accepted
)
}
// GET /api/reports/:jobId — check status
export async function GET(request: Request, { params }) {
const job = await db.getJob(params.jobId)
if (job.status === 'completed') {
return Response.json({ status: 'completed', result: job.result })
}
return Response.json({
status: job.status, // 'processing' | 'failed'
retryAfter: 5, // seconds
})
}How to Monitor for Timeouts Before Users Report Them
Gateway timeouts are a symptom of slow backends. If you monitor latency proactively, you catch the slowdown before it becomes a timeout.
Nurbak Watch monitors every API route from inside your Next.js server and alerts you when latency spikes — before it crosses the gateway timeout threshold:
// instrumentation.ts
import { initWatch } from '@nurbak/watch'
export function register() {
initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
})
}When /api/reports response time climbs from 200ms to 15 seconds, you get a Slack/WhatsApp alert in under 10 seconds — while there's still time to fix it before the gateway timeout at 29 seconds kills the request.
Free during beta. No agents, no cold start overhead. Five lines of code.

