Your Stripe webhook endpoint processes payment confirmations. It's been returning 500 errors for the last 2 hours. Stripe retried 3 times, gave up, and moved on. You have 47 unprocessed payments. You find out tomorrow from a customer email.

Webhooks are the most fragile integration point in modern applications — and the least monitored. The sender fires and forgets. If your endpoint fails, nobody tells you.

Why Webhooks Are Hard to Monitor

  • Incoming traffic, not outgoing. You don't control when webhooks arrive. You can't test them on demand.
  • No consistent error reporting. Stripe has a dashboard with delivery logs. GitHub has delivery history. Your custom B2B partner sends webhooks with no logging at all.
  • Silent failures. A webhook that returns 200 but doesn't process the payload correctly is worse than a 500 — the sender thinks it succeeded.
  • Retry masking. Providers retry failed deliveries. If your endpoint fails once and succeeds on retry, you never see the failure — but the delayed processing might cause issues.

What to Monitor

MetricWhy it mattersAlert threshold
Success ratePercentage of webhooks processed without error< 99% sustained
Response timeStripe times out at 20s, GitHub at 10s> 5 seconds
Error rate by type4xx (bad payload) vs 5xx (your bug) vs timeoutAny 5xx spike
ThroughputExpected vs actual events — sudden drop means the sender stopped or your endpoint is rejectingDrop > 50%
Processing lagTime between event creation and your processing> 5 minutes

Webhook Timeouts by Provider

ProviderTimeoutRetriesRetry window
Stripe20 secondsUp to 348 hours
GitHub10 secondsUp to 3~hours
Shopify5 seconds19 times48 hours
Twilio15 secondsConfigurableConfigurable
Slack3 seconds3 times30 minutes

If your webhook endpoint takes 6 seconds to process a Shopify event, it times out every time. You'd never know unless you monitor response times.

Best Practices for Webhook Endpoints

1. Return 200 immediately, process async

    // Bad: process in the request handler (can timeout)
export async function POST(request: Request) {
  const event = await request.json()
  await processPayment(event)       // Takes 8 seconds
  await updateDatabase(event)       // Takes 3 seconds
  await sendConfirmationEmail(event) // Takes 2 seconds
  return Response.json({ received: true }) // 13 seconds — TIMEOUT
}

// Good: acknowledge and queue
export async function POST(request: Request) {
  const event = await request.json()
  await queue.send({ type: 'webhook', payload: event })  // 50ms
  return Response.json({ received: true })  // 50ms — done
}

2. Verify webhook signatures

    // Stripe signature verification
import Stripe from 'stripe'

export async function POST(request: Request) {
  const body = await request.text()
  const sig = request.headers.get('stripe-signature')

  try {
    const event = stripe.webhooks.constructEvent(
      body, sig!, process.env.STRIPE_WEBHOOK_SECRET!
    )
    // Process verified event
  } catch (err) {
    return new Response('Invalid signature', { status: 400 })
  }
}

3. Implement idempotency

Providers retry. Your endpoint might process the same event twice. Use the event ID as an idempotency key — check if you've already processed it before doing anything.

Monitoring Webhook Endpoints with Nurbak Watch

Since webhooks are just API routes, Nurbak Watch monitors them automatically alongside all your other endpoints:

    // instrumentation.ts — monitors ALL API routes including webhooks
import { initWatch } from '@nurbak/watch'

export function register() {
  initWatch({
    apiKey: process.env.NURBAK_WATCH_KEY,
  })
}

Your /api/webhooks/stripe and /api/webhooks/github endpoints get the same P50/P95/P99 latency tracking, error rates, and instant alerts as every other route. When your Stripe webhook starts returning 500s, you get a WhatsApp alert in 10 seconds — not a customer email in 10 hours.

Free during beta. 5 lines of code.

Related Articles