You deploy your Next.js app to Vercel. You want to monitor your API routes. You look at Datadog's setup guide: "Install the Datadog Agent on your host machine."

You don't have a host machine. You have serverless functions that exist for 50 milliseconds, serve a request, and vanish. The agent has nowhere to live.

This is the fundamental problem with serverless monitoring: every tool assumes you have a server. On Vercel, Lambda, and Cloudflare Workers, you don't. This guide covers what actually works.

Why Serverless Breaks Traditional Monitoring

Traditional APM tools (Datadog, New Relic, Dynatrace) were built for a world of long-running servers. They assume:

  • A daemon process can run alongside your app (it can't — there's no host)
  • The process persists between requests (it doesn't — functions are ephemeral)
  • Initialization happens once (it doesn't — every cold start re-initializes)
  • You can install system-level agents (you can't — no SSH access to the runtime)

These assumptions break on serverless, creating five specific problems:

1. Cold starts corrupt your latency data

A cold start adds 200-2000ms to the first request after a function scales up. If your APM tool doesn't separate cold start latency from request processing latency, your P95 looks terrible even when your code is fast.

    // What the APM sees:
// Request 1 (cold):  1,450ms  ← 1,200ms cold start + 250ms processing
// Request 2 (warm):    85ms
// Request 3 (warm):    92ms
// Request 4 (warm):    78ms
// Request 5 (cold):  1,380ms  ← Another cold start
//
// P95: 1,420ms — looks broken
// Actual app performance: 85ms — perfectly healthy
//
// Without cold start separation, you're optimizing the wrong thing.

2. Agent initialization adds to cold starts

APM agents need to boot when your function starts. That initialization isn't free:

AgentInit overheadImpact on cold start
Datadog (dd-trace)200-800ms+40-160% on a 500ms cold start
New Relic200-400ms+40-80%
Sentry50-150ms+10-30%
Nurbak Watch5-15ms+1-3%
No agent0msBaseline

For functions that cold-start frequently (low-traffic routes, edge functions, cron jobs), a 400ms agent overhead can double total response time.

3. Concurrency is per-function, not per-host

On a traditional server, you monitor CPU and memory to predict capacity. On serverless, each invocation gets its own isolated environment. "CPU usage" is meaningless. What matters is concurrent executions, throttling events, and per-invocation duration.

4. Logs are scattered across invocations

Each function invocation produces isolated logs. Correlating a user's request across multiple function invocations requires trace IDs — something that most serverless platforms don't provide natively.

5. Costs scale per invocation, not per host

A function that runs 1ms per request at 10,000 RPM costs differently than one that runs 500ms at 100 RPM. Traditional monitoring tracks host costs. Serverless monitoring needs to track cost per function, per invocation.

What to Monitor on Serverless

The five metrics that matter for serverless APIs:

MetricWhy it mattersHow to track
Cold start frequencyHow often your functions reinitializeCompare init time vs request time
Cold start durationHow much latency cold starts addMeasure init phase separately
P95 latency (warm only)True application performanceExclude cold start requests
Error rate per functionWhich routes are failingTrack 4xx/5xx per API route
Throttling / concurrency limitsWhen the platform rejects requestsPlatform metrics (CloudWatch, Vercel logs)

Platform-Specific Monitoring

Vercel (Next.js)

Vercel provides basic analytics in the dashboard: function invocations, duration, and errors. But no per-route P95, no cold start tracking, and no real-time alerts.

For meaningful monitoring, use the Next.js instrumentation.ts hook — the official entry point for observability that runs once per function initialization:

    // instrumentation.ts
export async function register() {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    // This runs once per cold start
    // Initialize your monitoring here
    const { initWatch } = await import('@nurbak/watch')
    initWatch({ apiKey: process.env.NURBAK_WATCH_KEY })
  }
}

AWS Lambda

Lambda publishes metrics to CloudWatch natively: invocations, duration, errors, throttles, and concurrent executions. Enable Lambda Insights for enhanced metrics including memory usage and init duration.

    # Enable Lambda Insights via AWS CLI
aws lambda update-function-configuration \
  --function-name my-api \
  --layers arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:38

Key CloudWatch metrics to alarm on:

  • Duration — P95 per function. Alert when 2x baseline.
  • Errors — Any sustained error rate above 0.1%.
  • Throttles — Any throttling means you're hitting concurrency limits.
  • ConcurrentExecutions — Track against your account limit (default 1,000).
  • InitDuration — Cold start time. Alert if increasing (usually means bundle size grew).

Cloudflare Workers

Workers use V8 isolates instead of containers. Cold starts are under 5ms (vs 200-2000ms on Lambda/Vercel). This changes the monitoring equation — cold starts are not a significant concern.

Monitor via the Cloudflare dashboard or Workers Analytics API:

    # Cloudflare Workers Analytics API
curl -X POST https://api.cloudflare.com/client/v4/graphql \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "query": "{ viewer { accounts(filter: {accountTag: \"ACCOUNT_ID\"}) { workersInvocationsAdaptive(limit: 10, filter: {datetime_gt: \"2026-03-31\"}) { sum { requests errors subrequests } quantiles { cpuTimeP50 cpuTimeP99 } } } } }"
  }'

Three Monitoring Approaches Compared

ApproachCold start impactSetupCostCoverage
Platform native (Vercel Analytics, CloudWatch)0msAutomaticFree-$10/moBasic: invocations, duration, errors
Lightweight SDK (Nurbak Watch, Sentry)5-50ms5-15 min$0-29/moPer-route metrics, real-time alerts
Full APM (Datadog, New Relic)200-800ms1-4 hours$200-800/moFull: traces, logs, infra, APM

Recommendation for most serverless teams: Start with platform-native metrics (free) + a lightweight SDK for per-route monitoring and alerts. Only add a full APM if you need distributed tracing across 20+ functions or detailed infrastructure profiling.

Nurbak Watch: Built for Serverless Next.js

Nurbak Watch was designed specifically for serverless Next.js deployments on Vercel. It uses the instrumentation.ts hook with minimal cold start impact:

    npm install @nurbak/watch
    // instrumentation.ts
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
    apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

What you get:

  • Every API route auto-discovered and monitored
  • Cold start frequency and duration tracked separately
  • P50/P95/P99 latency from warm requests (not corrupted by cold starts)
  • Error rates per endpoint
  • Alerts via Slack, email, or WhatsApp in under 10 seconds
  • +5-15ms cold start overhead (vs 200-800ms for Datadog/New Relic)

Free during beta, $29/month after. No per-host pricing — because there is no host.

Related Articles