Observability

Observability spans every layer of the platform — from the cloud infrastructure through Kubernetes to the application — and it’s where two promises of the offer come together: the live, shared dashboards that are your transparency window, and the alerting that drives The Operation.

We standardize on the Grafana open-source stack, deployed in-cluster.

The stack

Signal	Tool	What it gives you
Dashboards	Grafana	A single pane over metrics, logs, and traces
Metrics	Prometheus (scaled with Mimir)	Time-series for infra, cluster, and app
Logs	Loki	Centralized, queryable logs, correlated with metrics
Traces	Tempo	Distributed tracing across services
Alerting	Alertmanager	Routing, grouping, and de-duplication of alerts

Metrics, logs, and traces are correlated in Grafana, so an alert leads straight to the relevant logs and traces — not a hunt across disconnected tools.

Your transparency window

The dashboards are shared with you and live — not a monthly PDF. They’re how you see exactly what we run on your behalf, in real time. This is the offer’s “observability stack + live shared dashboards” made concrete: no black box, no key-person knowledge, full visibility into the platform’s health, performance, and cost signals.

SLOs and alerting

We define SLOs (service-level objectives) for the signals that matter to your product.
Alertmanager routes alerts by severity to the on-call rotation.
Alerts are the entry point to incident handling — see The Operation.

infra · cluster · app
   │  metrics → Prometheus / Mimir ┐
   │  logs    → Loki               ├─► Grafana (shared dashboards)
   │  traces  → Tempo              ┘        │
   └────────────────────────────► Alertmanager ─► on-call (Teams / Slack)

This closes the loop between the Platform and The Operation: the platform emits the signals, and observability turns them into the dashboards you watch and the alerts we act on.