What metrics should I monitor on an API gateway?

The critical API gateway metrics are: request latency (total and per-stage), error rates (4xx and 5xx split by client vs server errors), throttling rate (requests rejected due to rate limits), cache hit rate (if caching is enabled), and request count by route and method. Monitor these at the gateway level and complement with application-level monitoring for full visibility.

How do I monitor AWS API Gateway?

AWS API Gateway publishes metrics to CloudWatch automatically. Key metrics include Count (total requests), Latency (full round-trip time), IntegrationLatency (backend processing time), 4XXError and 5XXError (error counts), and CacheHitCount/CacheMissCount. You can create CloudWatch alarms on these metrics and filter by stage, resource, and method. Enable detailed metrics in your stage settings for per-route breakdowns.

How do I monitor Kong API Gateway?

Kong exposes metrics through its Prometheus plugin, which provides request counts, latency histograms, and bandwidth metrics per service and route. Enable the plugin globally, scrape the /metrics endpoint with Prometheus, and visualize with Grafana. Kong also offers a built-in Vitals dashboard in Kong Enterprise for real-time analytics without additional infrastructure.

API Gateway Monitoring: AWS, Kong & Best Practices (2026)

Your API gateway sits in front of every request. It handles authentication, rate limiting, routing, and sometimes caching. It's the single point that every API call passes through.

Which makes it the single point where things can go wrong without anyone noticing.

A misconfigured rate limit silently throttles 15% of legitimate traffic. A cache invalidation bug serves stale data for hours. An integration timeout cascades into 504s that your backend logs don't capture because the request never reached your server.

This guide covers what to monitor on your API gateway, specific setups for AWS API Gateway and Kong, and the critical gap that gateway-level monitoring leaves at the application layer.

API Gateway Metrics That Actually Matter

Most gateways expose dozens of metrics. These are the five that prevent outages:

1. Request Latency (Total vs Integration)

Total latency is the full round-trip time from when the gateway receives the request to when it sends the response. Integration latency is just the time your backend takes to process it. The difference is gateway overhead — auth checks, request transformation, logging.

// What the numbers tell you:
// Total latency:        450ms
// Integration latency:  120ms
// Gateway overhead:     330ms  ← Something is wrong at the gateway level

// Healthy ratio: gateway overhead should be < 20% of total latency
// If overhead is > 50%, check: auth middleware, request/response transforms,
// logging plugins, WAF rules

If integration latency is low but total latency is high, the problem is in the gateway — not your backend. This distinction saves hours of debugging in the wrong place.

2. Error Rates — 4xx vs 5xx

Split these into client errors and server errors, because the response is completely different:

4xx spike — Usually means a client-side change: broken frontend deployment, expired API keys, changed request format. Check if a deployment happened recently.
5xx spike — The gateway or backend is failing. Check integration timeouts, backend health, and gateway configuration changes.
429 (Too Many Requests) — Rate limiting kicked in. Could be legitimate protection or misconfigured limits throttling real users.
502/504 — Backend is unreachable or too slow. The gateway timed out waiting for your server.

A global 5xx rate of 0.5% is meaningless if all errors come from one route. Always monitor errors per route, not just globally.

3. Throttling Rate

How many requests are being rejected by rate limits. Some throttling is intentional (protecting your backend from abuse). Too much throttling means legitimate users are being blocked:

// Healthy: 0.1% of requests throttled (bots, scrapers)
// Warning: 2-5% throttled — check if limits are too aggressive
// Critical: 10%+ throttled — you're losing real users

// Common mistake: setting per-IP limits that hit users behind
// corporate NATs (thousands of employees, one external IP)

4. Cache Hit Rate

If your gateway caches responses, the hit rate tells you if caching is actually working:

Above 80%: Caching is effective. Your backend handles 5x less traffic than raw request count.
50-80%: Decent, but check TTL settings and cache key configuration.
Below 50%: Something is wrong. Either cache keys are too specific (every request is unique) or TTLs are too short.
Sudden drop: Cache was invalidated or the cache layer went down. Backend is getting hammered with full traffic.

5. Request Count by Route and Method

Traffic distribution tells you where to focus optimization and where to set tighter limits:

// Typical API traffic distribution:
// GET  /api/products     → 45% of traffic  (cacheable, optimize here first)
// GET  /api/users/me     → 22% of traffic  (auth-dependent, watch latency)
// POST /api/events       → 18% of traffic  (write-heavy, watch error rate)
// POST /api/checkout     → 3% of traffic   (low volume, highest value)
// Other                  → 12% of traffic

// The endpoint with 3% traffic might generate 80% of your revenue.
// Monitor it like it's your most important route — because it is.

Monitoring AWS API Gateway

AWS API Gateway publishes metrics to CloudWatch automatically. Here's how to set up meaningful monitoring:

Enable Detailed Metrics

By default, AWS API Gateway only publishes aggregate metrics. To get per-route breakdowns, enable detailed metrics on your stage:

# AWS CLI — enable detailed metrics
aws apigateway update-stage \
  --rest-api-id your-api-id \
  --stage-name prod \
  --patch-operations \
op=replace,path=/~1*/metrics/enabled,value=true

Or in your CloudFormation / CDK template:

# CloudFormation
Resources:
  ApiStage:
Type: AWS::ApiGateway::Stage
Properties:
  StageName: prod
  RestApiId: !Ref MyApi
  MethodSettings:
    - HttpMethod: "*"
      ResourcePath: "/*"
      MetricsEnabled: true
      DataTraceEnabled: false  # Don't log request/response bodies
      ThrottlingBurstLimit: 500
      ThrottlingRateLimit: 1000

Key CloudWatch Metrics

Metric	What it measures	Alert when
`Count`	Total API requests	Drop > 30% from baseline (traffic loss)
`Latency`	Full request-response time	P95 > 2x baseline
`IntegrationLatency`	Backend processing time only	P95 > 1 second
`4XXError`	Client error count	Rate > 5% sustained
`5XXError`	Server error count	Any sustained > 0.1%
`CacheHitCount`	Responses served from cache	Hit rate drops below 50%

CloudWatch Alarm Example

# Alert when 5xx error rate exceeds 1% for 5 minutes
aws cloudwatch put-metric-alarm \
  --alarm-name "api-gateway-5xx-rate" \
  --namespace "AWS/ApiGateway" \
  --metric-name "5XXError" \
  --dimensions Name=ApiName,Value=MyAPI Name=Stage,Value=prod \
  --statistic Sum \
  --period 300 \
  --threshold 50 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789:alerts

Monitoring Kong Gateway

Kong exposes metrics through plugins. The Prometheus plugin is the standard approach for production monitoring:

Enable the Prometheus Plugin

# Enable globally (all services and routes)
curl -X POST http://localhost:8001/plugins \
  --data "name=prometheus" \
  --data "config.per_consumer=false" \
  --data "config.status_code_metrics=true" \
  --data "config.latency_metrics=true" \
  --data "config.bandwidth_metrics=true"

Or declaratively in kong.yml:

# kong.yml
plugins:
  - name: prometheus
config:
  per_consumer: false
  status_code_metrics: true
  latency_metrics: true
  bandwidth_metrics: true

Key Prometheus Metrics

Once enabled, Kong exposes metrics at :8001/metrics:

# Request count per service, route, and status code
kong_http_requests_total{service="users-api",route="get-users",code="200"} 48293
kong_http_requests_total{service="users-api",route="get-users",code="500"} 12

# Latency histograms (request, kong processing, upstream)
kong_request_latency_ms_bucket{service="users-api",le="100"} 45000
kong_request_latency_ms_bucket{service="users-api",le="500"} 47800
kong_request_latency_ms_bucket{service="users-api",le="1000"} 48200

kong_upstream_latency_ms_bucket{service="users-api",le="100"} 47500
kong_upstream_latency_ms_bucket{service="users-api",le="500"} 48100

# Kong processing overhead
kong_kong_latency_ms_bucket{service="users-api",le="10"} 48000
kong_kong_latency_ms_bucket{service="users-api",le="50"} 48290

Grafana Dashboard Queries

# Error rate per service (PromQL)
sum(rate(kong_http_requests_total{code=~"5.."}[5m])) by (service)
/
sum(rate(kong_http_requests_total[5m])) by (service)

# P95 latency per route
histogram_quantile(0.95,
  sum(rate(kong_request_latency_ms_bucket[5m])) by (le, route)
)

# Throughput per route (requests per second)
sum(rate(kong_http_requests_total[5m])) by (route)

The Gap: What Gateways Don't Monitor

Gateway metrics tell you what happened at the network edge. They don't tell you what happened inside your application. This creates a significant blind spot:

What gateways see	What gateways miss
Total request latency	Where inside the app the time was spent (DB, cache, computation)
HTTP status codes	Business logic errors returned as 200 with error payloads
Request count per route	Which database queries each route runs and how slow they are
Gateway-level throttling	Application-level concurrency issues (connection pool exhaustion)
Upstream health (binary: up/down)	Upstream degradation (responding but slow, partial failures)

Real scenario: Your Kong dashboard shows GET /api/products at 450ms P95. Is that good or bad? You don't know, because you can't see that 400ms of that is a single N+1 database query that used to take 40ms before someone added a .populate() call last Tuesday.

Gateway monitoring tells you the symptom. Application-level monitoring tells you the cause.

Filling the Gap with Application-Level Monitoring

For Next.js applications behind an API gateway, Nurbak Watch provides the application-level visibility that gateways can't:

// instrumentation.ts — runs inside your Next.js server
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

This runs inside your server process — downstream of the gateway. It sees every request after auth, after rate limiting, after caching. What it adds to your monitoring:

Server-side latency per route — How long your code takes, excluding gateway overhead. When gateway latency is 450ms but server latency is 40ms, you know the problem is at the gateway layer.
Real error rates — Including errors your gateway doesn't see, like 200 responses with error payloads, or exceptions caught and returned as degraded responses.
Cold start tracking — On Vercel, every cold start looks like a slow request to the gateway. Nurbak Watch tracks cold start frequency and duration separately.
Instant alerts — Slack, email, or WhatsApp in under 10 seconds. Gateway alerts typically go through CloudWatch or Prometheus alerting pipelines with 1-5 minute delays.

Recommended Stack

Layer	Tool	What it monitors
Gateway	CloudWatch (AWS) or Prometheus (Kong)	Traffic shape, throttling, cache, gateway errors
Application	Nurbak Watch	Server-side latency, error rates, cold starts, per-route metrics
Uptime	External ping (UptimeRobot, free)	Total outage detection from outside

Three layers, three perspectives, full coverage. The gateway watches the front door. Nurbak Watch watches every room. The uptime ping confirms the building is still standing.

Get Started

If you're running Next.js behind an API gateway and want application-level visibility without deploying another agent:

Go to nurbak.com and create an account
Run npm install @nurbak/watch
Add 5 lines to instrumentation.ts
Deploy

Nurbak Watch starts at $19/month flat. It complements your gateway monitoring with the application-level metrics you're currently blind to.

API Gateway Monitoring: AWS, Kong & Best Practices

API Gateway Metrics That Actually Matter

1. Request Latency (Total vs Integration)

2. Error Rates — 4xx vs 5xx

3. Throttling Rate

4. Cache Hit Rate

5. Request Count by Route and Method

Monitoring AWS API Gateway

Enable Detailed Metrics

Key CloudWatch Metrics

CloudWatch Alarm Example

Monitoring Kong Gateway

Enable the Prometheus Plugin

Key Prometheus Metrics

Grafana Dashboard Queries

The Gap: What Gateways Don't Monitor

Filling the Gap with Application-Level Monitoring

Recommended Stack

Get Started

Related Articles

Fabián Delgado

Ready to try it?

API Gateway Metrics That Actually Matter

1. Request Latency (Total vs Integration)

2. Error Rates — 4xx vs 5xx

3. Throttling Rate

4. Cache Hit Rate

5. Request Count by Route and Method

Monitoring AWS API Gateway

Enable Detailed Metrics

Key CloudWatch Metrics

CloudWatch Alarm Example

Monitoring Kong Gateway

Enable the Prometheus Plugin

Key Prometheus Metrics

Grafana Dashboard Queries

The Gap: What Gateways Don't Monitor

Filling the Gap with Application-Level Monitoring

Recommended Stack

Get Started

Related Articles

Fabián Delgado

Ready to try it?

Read Next

SSL Certificate Monitoring: Never Get Caught by an Expired Cert Again

Dead Man's Switch for Cron Jobs: Get Alerted When a Job Stops Running

SLO vs SLA vs SLI: What's the Difference? (With Examples)