The seven signals that actually page someone
Out of 142 metrics our average customer ships, only seven correlate with paid downtime. Here's how we found them and what we threw away.
Read the brief →We monitor 64,000 distributed devices across 11 Canadian carriers. Most of what they emit is noise. We isolate the seven signals that actually wake someone up — and route them to the human who can act.
Most observability tools tell you everything is on fire, then leave you to guess which fire matters. We took the opposite bet — fewer dashboards, fewer charts, one page that says: this, right now, is what you need to read.
Field reports from the platform team. Practical, sometimes opinionated, never sponsored.
Out of 142 metrics our average customer ships, only seven correlate with paid downtime. Here's how we found them and what we threw away.
Read the brief →
Latency to a Tim Hortons drive-thru kiosk in Brampton matters. Latency to a serverless test bench in us-east-1 — less so.
Read the brief →
Last quarter our ingest dropped for nineteen seconds. Here is the timeline, the root cause, and the fix we shipped on a Sunday afternoon.
Read the brief →No service mesh you didn't ask for. No agent that updates itself at 3 a.m. and breaks your kiosk in Trois-Rivières. The whole thing is twelve binaries, four databases, and one rule: if a junior engineer can't draw it on Monday morning, we redesign it.
Below — the parts of the platform you'd actually be running, ranked by how often we've had to wake up for them.
A loose archive — short writeups from the on-call rotation. Updated when we have something honest to say.
What we shipped, what we broke, what we'd do differently. About 1,400 ops engineers across Canada read it. No tracking pixels. No upsell.