Skip to content

Observability

Observability spans every layer of the platform — from the cloud infrastructure through Kubernetes to the application — and it’s where two promises of the offer come together: the live, shared dashboards that are your transparency window, and the alerting that drives The Operation.

We standardize on the Grafana open-source stack, deployed in-cluster.

SignalToolWhat it gives you
DashboardsGrafanaA single pane over metrics, logs, and traces
MetricsPrometheus (scaled with Mimir)Time-series for infra, cluster, and app
LogsLokiCentralized, queryable logs, correlated with metrics
TracesTempoDistributed tracing across services
AlertingAlertmanagerRouting, grouping, and de-duplication of alerts

Metrics, logs, and traces are correlated in Grafana, so an alert leads straight to the relevant logs and traces — not a hunt across disconnected tools.

The dashboards are shared with you and live — not a monthly PDF. They’re how you see exactly what we run on your behalf, in real time. This is the offer’s “observability stack + live shared dashboards” made concrete: no black box, no key-person knowledge, full visibility into the platform’s health, performance, and cost signals.

  • We define SLOs (service-level objectives) for the signals that matter to your product.
  • Alertmanager routes alerts by severity to the on-call rotation.
  • Alerts are the entry point to incident handling — see The Operation.
infra · cluster · app
│ metrics → Prometheus / Mimir ┐
│ logs → Loki ├─► Grafana (shared dashboards)
│ traces → Tempo ┘ │
└────────────────────────────► Alertmanager ─► on-call (Teams / Slack)

This closes the loop between the Platform and The Operation: the platform emits the signals, and observability turns them into the dashboards you watch and the alerts we act on.