You just deployed your Next.js app to Vercel. Your team lead says you need monitoring. So you Google "Next.js APM" and find Datadog's setup guide.

Step 1: Install dd-trace. Step 2: Install the Datadog Agent on your server. Step 3: Configure datadog.yaml with 47 options. Step 4: Set up trace collection. Step 5: Configure log forwarding. Step 6: Realize the Datadog Agent needs a server to run on — and you're on Vercel.

You don't have a server. You have serverless functions. The agent has nowhere to live.

This is the fundamental mismatch between traditional APM tools and modern Next.js deployments. The tools were built for a world of VMs and containers. Your app lives in a different world.

What APM Agents Actually Do (and Cost)

An APM agent is a separate process that runs alongside your application. Datadog's agent, New Relic's daemon, Dynatrace's OneAgent — they all follow the same pattern:

  1. A daemon process runs on your host machine (200-500MB RAM)
  2. A language-specific library (dd-trace, newrelic) instruments your code
  3. The library sends trace data to the local daemon over a Unix socket or localhost
  4. The daemon batches, compresses, and forwards data to the vendor's cloud

This architecture made sense when everyone ran on EC2 instances or Kubernetes pods. It breaks down completely in three scenarios that Next.js developers hit constantly:

Problem 1: Serverless has no host machine

On Vercel, Netlify, or AWS Lambda, there is no persistent server. Each API route invocation is an isolated function. The agent daemon has nowhere to run. Some vendors offer "serverless mode" where the library sends data directly to the cloud — but that means a network request on every function invocation, adding 50-200ms of latency to every API call.

Problem 2: Cold starts get worse

APM libraries need to initialize when your function starts. Here's what happens during a cold start with dd-trace:

// What your code looks like
import tracer from 'dd-trace'
tracer.init() // This is not free

// What actually happens during init():
// 1. Load configuration (read env vars, parse options)     ~20ms
// 2. Initialize span processors                            ~30ms
// 3. Set up monkey-patching for http, fetch, pg, etc.      ~80ms
// 4. Establish connection to collector                     ~100ms
// 5. Load sampling rules                                   ~15ms
// ─────────────────────────────────────────────────────────
// Total cold start overhead:                             ~245ms

On a function that normally cold-starts in 300ms, you just added 80% more initialization time. And this happens on every cold start — which, on Vercel's free and Pro tiers, can be every few minutes for low-traffic routes.

Problem 3: Configuration complexity

A typical Datadog setup for a Next.js app requires:

  • DD_API_KEY — your API key
  • DD_SITE — the Datadog region
  • DD_SERVICE — your service name
  • DD_ENV — environment (production, staging)
  • DD_VERSION — your app version
  • DD_TRACE_ENABLED — enable/disable tracing
  • DD_LOGS_INJECTION — correlate logs with traces
  • DD_RUNTIME_METRICS_ENABLED — runtime stats
  • DD_PROFILING_ENABLED — code profiling
  • DD_TRACE_SAMPLE_RATE — sampling rate

That's 10 environment variables just to get started. Miss one and you get partial data. Set one wrong and you get a $2,000 bill from trace overages.

Compare that to what you actually need: "Tell me when my API routes are slow or broken."

The Agent Tax: What You're Really Paying

Beyond the technical friction, agents impose hidden costs that compound over time:

CostAgent-Based (Datadog/New Relic)Agentless (Lightweight SDK)
Memory overhead300-500MB (daemon) + 50-100MB (library)< 5MB
Cold start penalty+200-800ms+5-15ms
Environment variables10-15 required1 (API key)
Config filesdatadog.yaml / newrelic.jsNone
Setup time2-4 hours5 minutes
Monthly cost (small team)$71-300+/host/month$0-29/month
Works on Vercel serverlessPartially (degraded mode)Fully
Requires infrastructure teamOften yesNo

For a solo developer or a team of five shipping a SaaS product, the agent model is overkill. You're paying enterprise complexity for a problem that has a much simpler solution.

What Agentless Monitoring Looks Like

Agentless monitoring flips the architecture. Instead of a daemon process + library + collector pipeline, you get a single lightweight SDK that runs inside your application process:

// The entire monitoring setup for a Next.js app:

// instrumentation.ts
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

That's it. No daemon, no config file, no 10 environment variables. The SDK:

  1. Initializes in under 15ms (vs 200-800ms for APM agents)
  2. Uses less than 5MB of memory (vs 300-500MB for agents)
  3. Auto-discovers every API route in your Next.js app
  4. Batches and sends metrics asynchronously (zero impact on response time)
  5. Survives serverless cold starts because it's part of your function, not a separate process

The key insight: your Next.js app already knows everything about its API routes. It knows every request path, every response code, every error. You don't need a separate process to observe it — you just need to capture what's already there.

How It Works Under the Hood

Next.js 13.2+ introduced the instrumentation.ts hook specifically for observability. When your server starts, Next.js calls the register() function once. This is the official, supported entry point for monitoring.

An agentless SDK uses this hook to:

// Simplified view of what happens inside the SDK

export function initWatch(config: { apiKey: string }) {
  // 1. Hook into Node.js HTTP handling
  //    Uses diagnostics_channel (Node 16+) — no monkey-patching
  const channel = diagnostics_channel.subscribe('http.server.request')

  // 2. For each request, capture timing and metadata
  channel.onMessage((message) => {
const { request, response } = message
metrics.record({
  path: request.url,
  method: request.method,
  status: response.statusCode,
  duration: message.duration,
})
  })

  // 3. Batch and send every 10 seconds (non-blocking)
  setInterval(() => {
const batch = metrics.flush()
if (batch.length > 0) {
  // Fire-and-forget — does not block your API responses
  fetch('https://api.nurbak.com/v1/ingest', {
    method: 'POST',
    body: JSON.stringify(batch),
    headers: { 'Authorization': `Bearer ${config.apiKey}` },
  }).catch(() => {}) // Silent failure — monitoring should never break your app
}
  }, 10_000)
}

The critical design decisions:

  • No monkey-patching. APM agents rewrite your http, fetch, and database modules at runtime. This causes version conflicts, breaks TypeScript types, and makes debugging harder. diagnostics_channel is the Node.js-native way to observe without modifying.
  • Async, batched sending. Metrics are buffered in memory and sent in batches. Individual API responses are never delayed by the monitoring system.
  • Silent failure. If the monitoring endpoint is down, your app keeps running normally. Monitoring should observe, never interfere.

What You Get Without the Agent

With an agentless SDK like Nurbak Watch, every API route is tracked automatically:

  • Latency percentiles — P50, P95, P99 for every endpoint. Real server-side timing, not synthetic pings.
  • Error rates — 4xx and 5xx percentage per route, with automatic spike detection.
  • Throughput — Requests per minute. See which endpoints are hot and which are idle.
  • Cold start tracking — On Vercel, know exactly how often your functions cold-start and how much latency it adds.
  • Instant alerts — Slack, email, or WhatsApp within 10 seconds of an incident. Not minutes — seconds.

No Grafana dashboards to build. No PromQL queries to learn. No time-series database to scale. You install it, deploy, and your monitoring is live.

When You Actually Need an Agent

To be fair, agents aren't always wrong. You might genuinely need a full APM agent if:

  • You run on Kubernetes and need distributed tracing across 50+ microservices with automatic service maps.
  • You need deep runtime profiling — CPU flame graphs, memory leak detection, garbage collection analysis.
  • Compliance requires it — some SOC 2 or HIPAA implementations mandate specific APM tooling.
  • You have a dedicated platform team that manages observability infrastructure full-time.

If none of these apply — if you're a team of 1-15 developers shipping a Next.js SaaS — you don't need an agent. You need visibility into your API routes with minimal friction.

Migration: Removing Your Agent in 10 Minutes

If you're currently running an APM agent with Next.js, here's how to switch:

Step 1: Remove the agent library

# Remove Datadog
npm uninstall dd-trace

# Or remove New Relic
npm uninstall newrelic

# Or remove Dynatrace
npm uninstall @dynatrace/oneagent

Step 2: Clean up environment variables

# Remove from .env.local and Vercel dashboard:
# DD_API_KEY, DD_SITE, DD_SERVICE, DD_ENV, DD_VERSION,
# DD_TRACE_ENABLED, DD_LOGS_INJECTION, DD_RUNTIME_METRICS_ENABLED,
# DD_PROFILING_ENABLED, DD_TRACE_SAMPLE_RATE
# ... (you get the idea)

Step 3: Install and configure Nurbak Watch

npm install @nurbak/watch
// instrumentation.ts
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

Step 4: Add one environment variable

# .env.local (or Vercel dashboard)
NURBAK_WATCH_KEY=your_api_key_here

Step 5: Deploy

Your cold starts just got 200-800ms faster. Your function memory usage just dropped. And you still have full visibility into every API route.

Get Started — Free During Beta

Nurbak Watch is in beta and completely free during launch. No credit card. No agent. No daemon. No 47-option config file.

One npm install. Five lines of code. Every API route monitored.

Your ops team (which is probably also you) will thank you.

Related Articles