Jan 22, 2026

OpenTelemetry Metrics for Root Cause Analysis

Connect your existing OpenTelemetry pipeline to give Kestrel real-time metrics context during incident investigation.

Evan Chopra

When an incident hits, your AI copilot needs the same data your engineers rely on: real metrics showing what's actually happening inside your services. CPU spiking? Memory pressure? Request latency climbing? Without this context, root cause analysis is just guesswork.

Today, we're introducing OpenTelemetry Metrics Integration for Kestrel's incident RCA. Connect your existing OTEL pipeline, and Kestrel's AI agents gain direct access to your application and infrastructure metrics during every investigation.

How It Works

The Kestrel Operator now includes an OTLP receiver that accepts metrics from your OpenTelemetry Collectors. Metrics are stored locally on the operator in a rolling 30-minute window, giving Kestrel's AI agents instant access to recent observability data without requiring a separate metrics backend.

When an incident is detected, the RCA agent can query metrics by namespace, workload, or pod. It analyzes patterns like memory growth, CPU throttling, or latency spikes to identify the root cause. The agent then selects the most relevant metrics as evidence, which are displayed as interactive charts in the incident timeline.

Evidence You Can See

When Kestrel identifies the root cause of an incident, it doesn't just tell you what happened. It shows you the metrics that prove it. Evidence metrics appear as time-series charts directly in the incident view, with annotations highlighting the exact moments that matter: the memory spike at 10:06 AM, the CPU throttling that started 30 seconds before the crash, the request latency that doubled during the outage.

This isn't a dashboard you have to interpret yourself. Kestrel's AI has already done the analysis and is showing you exactly what it found.

Works With Your Existing Setup

If you're already running OpenTelemetry Collectors in your cluster, integration takes just a few lines of configuration. Point your collector at the Kestrel Operator and you're done.

Any OTEL-compatible source
Application metrics, Prometheus scrapers, infrastructure collectors
All metric types
Gauges, counters, histograms, and summaries
Kubernetes-aware
Metrics are automatically enriched with namespace, workload, and pod context

Configuration

Enable OTEL metrics in your Kestrel Operator Helm values:

operator:
  otel:
    enabled: true
    receiverPort: 4317
  metricsStore:
    retention: "30m"
    maxSeries: 100000

Configure your OTEL Collector to export to the operator. Note: the k8sattributes processor is required for Kubernetes context:

exporters:
  otlp/kestrel:
    endpoint: kestrel-operator.kestrel-ai.svc.cluster.local:4317
    tls:
      insecure: true

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.deployment.name
        - k8s.container.name

Getting Started

OpenTelemetry metrics integration is available now for all Kestrel users. Update your operator to the latest version, enable OTEL metrics in your Helm values, and configure your collector to export to the operator. For detailed setup instructions, see our OpenTelemetry integration guide.