Skip to main content
Back to Projects

LLM Observability Platform

AI Engineer
Observability LLMs MLOps Tracing Python
Stack: Python, OpenTelemetry, ClickHouse, Grafana, FastAPI

Context

Why LLM apps need specialized observability

Limitations of traditional APM tools for LLM workloads

Goals: regression detection, cost tracking, quality monitoring


Constraints

  • Overhead: < 5ms latency impact per request
  • Throughput: Handle 1K+ requests/second
  • Privacy: PII detection and redaction in logs
  • Integration: Work with existing monitoring stack

Architecture

Instrumentation approach (decorators, middleware)

Data model for LLM traces

Storage backend selection

Dashboard and alerting design

Evaluation pipeline integration


Implementation Highlights

Trace Capture

Capturing prompt/response pairs efficiently

Cost Attribution

Token counting and cost per request/user/feature

Drift Detection

Semantic similarity tracking for output quality

Sampling

Strategies for high-volume applications


Evaluation

MetricTargetAchieved
Overhead< 5msTBD
Detection Time< 1 hourTBD
False Positive Rate< 5%TBD
Coverage> 95%TBD

Time-to-detection for injected regressions

User feedback from internal teams


Outcomes

  • Regressions caught before user reports
  • Mean time to detection
  • Cost savings from optimization insights
  • Debugging time reduction

Learnings

Metrics That Matter

Which signals actually predict problems

Alert vs. Log

When to wake someone up vs. when to record

Detail vs. Overhead

The tradeoff and where we landed