You deploy your Next.js app to Vercel. You want to monitor your API routes. You look at Datadog's setup guide: "Install the Datadog Agent on your host machine."
You don't have a host machine. You have serverless functions that exist for 50 milliseconds, serve a request, and vanish. The agent has nowhere to live.
This is the fundamental problem with serverless monitoring: every tool assumes you have a server. On Vercel, Lambda, and Cloudflare Workers, you don't. This guide covers what actually works.
Why Serverless Breaks Traditional Monitoring
Traditional APM tools (Datadog, New Relic, Dynatrace) were built for a world of long-running servers. They assume:
- A daemon process can run alongside your app (it can't — there's no host)
- The process persists between requests (it doesn't — functions are ephemeral)
- Initialization happens once (it doesn't — every cold start re-initializes)
- You can install system-level agents (you can't — no SSH access to the runtime)
These assumptions break on serverless, creating five specific problems:
1. Cold starts corrupt your latency data
A cold start adds 200-2000ms to the first request after a function scales up. If your APM tool doesn't separate cold start latency from request processing latency, your P95 looks terrible even when your code is fast.
// What the APM sees:
// Request 1 (cold): 1,450ms ← 1,200ms cold start + 250ms processing
// Request 2 (warm): 85ms
// Request 3 (warm): 92ms
// Request 4 (warm): 78ms
// Request 5 (cold): 1,380ms ← Another cold start
//
// P95: 1,420ms — looks broken
// Actual app performance: 85ms — perfectly healthy
//
// Without cold start separation, you're optimizing the wrong thing.2. Agent initialization adds to cold starts
APM agents need to boot when your function starts. That initialization isn't free:
| Agent | Init overhead | Impact on cold start |
|---|---|---|
Datadog (dd-trace) | 200-800ms | +40-160% on a 500ms cold start |
| New Relic | 200-400ms | +40-80% |
| Sentry | 50-150ms | +10-30% |
| Nurbak Watch | 5-15ms | +1-3% |
| No agent | 0ms | Baseline |
For functions that cold-start frequently (low-traffic routes, edge functions, cron jobs), a 400ms agent overhead can double total response time.
3. Concurrency is per-function, not per-host
On a traditional server, you monitor CPU and memory to predict capacity. On serverless, each invocation gets its own isolated environment. "CPU usage" is meaningless. What matters is concurrent executions, throttling events, and per-invocation duration.
4. Logs are scattered across invocations
Each function invocation produces isolated logs. Correlating a user's request across multiple function invocations requires trace IDs — something that most serverless platforms don't provide natively.
5. Costs scale per invocation, not per host
A function that runs 1ms per request at 10,000 RPM costs differently than one that runs 500ms at 100 RPM. Traditional monitoring tracks host costs. Serverless monitoring needs to track cost per function, per invocation.
What to Monitor on Serverless
The five metrics that matter for serverless APIs:
| Metric | Why it matters | How to track |
|---|---|---|
| Cold start frequency | How often your functions reinitialize | Compare init time vs request time |
| Cold start duration | How much latency cold starts add | Measure init phase separately |
| P95 latency (warm only) | True application performance | Exclude cold start requests |
| Error rate per function | Which routes are failing | Track 4xx/5xx per API route |
| Throttling / concurrency limits | When the platform rejects requests | Platform metrics (CloudWatch, Vercel logs) |
Platform-Specific Monitoring
Vercel (Next.js)
Vercel provides basic analytics in the dashboard: function invocations, duration, and errors. But no per-route P95, no cold start tracking, and no real-time alerts.
For meaningful monitoring, use the Next.js instrumentation.ts hook — the official entry point for observability that runs once per function initialization:
// instrumentation.ts
export async function register() {
if (process.env.NEXT_RUNTIME === 'nodejs') {
// This runs once per cold start
// Initialize your monitoring here
const { initWatch } = await import('@nurbak/watch')
initWatch({ apiKey: process.env.NURBAK_WATCH_KEY })
}
}AWS Lambda
Lambda publishes metrics to CloudWatch natively: invocations, duration, errors, throttles, and concurrent executions. Enable Lambda Insights for enhanced metrics including memory usage and init duration.
# Enable Lambda Insights via AWS CLI
aws lambda update-function-configuration \
--function-name my-api \
--layers arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:38Key CloudWatch metrics to alarm on:
Duration— P95 per function. Alert when 2x baseline.Errors— Any sustained error rate above 0.1%.Throttles— Any throttling means you're hitting concurrency limits.ConcurrentExecutions— Track against your account limit (default 1,000).InitDuration— Cold start time. Alert if increasing (usually means bundle size grew).
Cloudflare Workers
Workers use V8 isolates instead of containers. Cold starts are under 5ms (vs 200-2000ms on Lambda/Vercel). This changes the monitoring equation — cold starts are not a significant concern.
Monitor via the Cloudflare dashboard or Workers Analytics API:
# Cloudflare Workers Analytics API
curl -X POST https://api.cloudflare.com/client/v4/graphql \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"query": "{ viewer { accounts(filter: {accountTag: \"ACCOUNT_ID\"}) { workersInvocationsAdaptive(limit: 10, filter: {datetime_gt: \"2026-03-31\"}) { sum { requests errors subrequests } quantiles { cpuTimeP50 cpuTimeP99 } } } } }"
}'Three Monitoring Approaches Compared
| Approach | Cold start impact | Setup | Cost | Coverage |
|---|---|---|---|---|
| Platform native (Vercel Analytics, CloudWatch) | 0ms | Automatic | Free-$10/mo | Basic: invocations, duration, errors |
| Lightweight SDK (Nurbak Watch, Sentry) | 5-50ms | 5-15 min | $0-29/mo | Per-route metrics, real-time alerts |
| Full APM (Datadog, New Relic) | 200-800ms | 1-4 hours | $200-800/mo | Full: traces, logs, infra, APM |
Recommendation for most serverless teams: Start with platform-native metrics (free) + a lightweight SDK for per-route monitoring and alerts. Only add a full APM if you need distributed tracing across 20+ functions or detailed infrastructure profiling.
Nurbak Watch: Built for Serverless Next.js
Nurbak Watch was designed specifically for serverless Next.js deployments on Vercel. It uses the instrumentation.ts hook with minimal cold start impact:
npm install @nurbak/watch // instrumentation.ts
import { initWatch } from '@nurbak/watch'
export function register() {
initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
})
}What you get:
- Every API route auto-discovered and monitored
- Cold start frequency and duration tracked separately
- P50/P95/P99 latency from warm requests (not corrupted by cold starts)
- Error rates per endpoint
- Alerts via Slack, email, or WhatsApp in under 10 seconds
- +5-15ms cold start overhead (vs 200-800ms for Datadog/New Relic)
Free during beta, $29/month after. No per-host pricing — because there is no host.

