00 — Passive LLM & agent observability
Agent observability from the network wire.
Heron is a passive analyzer that watches LLM traffic on the wire and reconstructs what your agents are actually doing — tool calls, multi-step plans, where time goes, where loops happen, who calls whom. No SDK. No sidecar. No proxy in the request path.
heron --pcap-file capture.pcap --no-retention
The problem
Agent code that looks fine on paper.
Most agent code looks fine on paper and falls apart in production.
A tool call stalls. The planner loops between two states. A downstream service silently substitutes a different model. The logs say everything is 200 OK — and the run still took nine seconds and three retries.
Heron reconstructs that behavior from the bytes on the wire and serves it through a console organized around turns and sessions, not raw HTTP calls. Multi-call interactions — planner → tool → planner → tool — stitch into a single addressable turn. Multi-leg proxy hops fold automatically.
Pipeline
From bytes on the wire to an agent turn.
packet capture → HTTP / SSE parse → wire-API decode → semantic extraction → agent-turn assembly. The pipeline never sits in the request path, so the observer can fail without breaking the calls it observes.
Same connection's packets always land on the same worker, so parsing state is local and lock-free. Multiple independent pipelines run side-by-side — low-latency local capture isolated from bursty cloud-probe ingress.
Heron reads the LLM API traffic, not the framework — so multi-step turns from Claude Code, OpenAI Codex, Hermes, OpenClaw, or your own agent all reconstruct the same way: tool call → tool result → planner → next tool, stitched into one addressable turn. Named profiles sharpen the stitching for Claude Code, Codex, and Hermes; everything else falls back to a generic profile.
The honest trade-off
Why not an SDK, a proxy, or OpenTelemetry?
| Approach | In request path | Needs client cooperation | Sees full bodies | Reconstructs agent turns |
|---|---|---|---|---|
| SDK instrumentation | ✗yes | ✗every client must | ✓yes | ✗every client must emit |
| Reverse proxy (LiteLLM…) | ✗yes | ✗clients point at it | ✓yes | ✗per-call only |
| OpenTelemetry from server | ✗yes | ✗server must emit | ~partial | ✗if the server tags it |
| Heron | ✓no | ✓no | ✓yes 1 | ✓yes |
1 TLS-terminated traffic only — Heron sees plaintext HTTP. Install it where the traffic is already decrypted: on the inference host, behind the TLS terminator, or fed by cloud-probe from a SPAN/TAP point. You give up cross-cluster client tracing; you get a single passive evidence chain that can't break the call when the observer fails, and that assembles the agent narrative for you.
Signals
Eight metrics, aggregated in sliding windows.
Per model and per route — the numbers ops, dev, and the business actually watch.
Capabilities
What's in the box.
Ingress
- ›libpcap on a live interfacezero-copy capture, BPF filters
- ›replay from .pcap filesany speed · reproducible runs
- ›ZMQ from cloud-probeSPAN/TAP hosts you can't install on
Providers
- ›OpenAI chat · responses
- ›Anthropic messages · SSE
- ›Azure OpenAI & Gemini
- ›vLLM · Ollama OpenAI-compatible local
Storage
- ›DuckDB default · embedded · single file
- ›PostgreSQL + TimescaleDB optional
- ›ClickHouse columnar · high-throughput
Quickstart
Try it in 30 seconds — no live capture, no privileges.
-
Grab a .pcap with LLM traffic
Any capture that contains plaintext HTTP to an LLM endpoint — or capture live on an interface later.
-
Replay it through Heron
Point Heron at the file.
--no-retentionkeeps it ephemeral — nothing is written to disk. -
Open the console
Browse to
localhost:3000— turns, calls, and live metrics, organized around what the agent did.
# replay a capture — ephemeral, no privileges heron --pcap-file capture.pcap --no-retention # …then open the console open http://localhost:3000
Datasheet