Edge Observability with Prometheus

The problem

Edge traffic used to sit outside the usual metrics path. Teams were blind to Cloudflare behavior, or worse, getting alerts that did not match actual user impact.

What I built

A Cloudflare Prometheus exporter to pull meaningful signals into the same observability stack we already used for internal services. Edge behavior became visible alongside application and cluster metrics instead of living in a separate silo.

Impact

False-positive alert volume dropped by roughly 40%, and on-call engineers could trust that a page was worth investigating. Combined with Grafana dashboards for storage and capacity metrics, forecasting and troubleshooting got materially easier.

What I took away

Exporters only help when they measure the right things at the right granularity. Alert quality matters. Noisy monitoring is almost as bad as no monitoring. Meet teams where their infrastructure actually lives, not just where it is convenient to scrape.

The public Observability Toolkit repo explores the same themes in a portable stack you can run locally and break on purpose to prove alerts work.