00 — Passive LLM & agent observability

Agent observability from the network wire.

Heron is a passive analyzer that watches LLM traffic on the wire and reconstructs what your agents are actually doing — tool calls, multi-step plans, where time goes, where loops happen, who calls whom. No SDK. No sidecar. No proxy in the request path.

$ heron --pcap-file capture.pcap --no-retention
no SDK no proxy off the request path full request & response bodies written in Rust Claude Code · Codex · Hermes · OpenClaw
01

The problem

Agent code that looks fine on paper.

Most agent code looks fine on paper and falls apart in production.

A tool call stalls. The planner loops between two states. A downstream service silently substitutes a different model. The logs say everything is 200 OK — and the run still took nine seconds and three retries.

Heron reconstructs that behavior from the bytes on the wire and serves it through a console organized around turns and sessions, not raw HTTP calls. Multi-call interactions — planner → tool → planner → tool — stitch into a single addressable turn. Multi-leg proxy hops fold automatically.

02

Pipeline

From bytes on the wire to an agent turn.

packet capture → HTTP / SSE parse → wire-API decode → semantic extraction → agent-turn assembly. The pipeline never sits in the request path, so the observer can fail without breaking the calls it observes.

01 Ingress
libpcap · live NIC/ .pcap replay · any speed/ cloud-probe · ZMQ
02 Dispatch
flow dispatcher · hash by 5-tuple worker #0worker #1worker #N
03 Decode
HTTP / SSE parse wire-API detect semantic extraction
04 Assemble
turn tracker+ metrics aggregator+ storage sink
05 Serve
DuckDB REST API React console · localhost:3000

Same connection's packets always land on the same worker, so parsing state is local and lock-free. Multiple independent pipelines run side-by-side — low-latency local capture isolated from bursty cloud-probe ingress.

Agent-agnostic · turn profiles
Claude Code named OpenAI Codex named Hermes named OpenClaw experimental any agent generic

Heron reads the LLM API traffic, not the framework — so multi-step turns from Claude Code, OpenAI Codex, Hermes, OpenClaw, or your own agent all reconstruct the same way: tool call → tool result → planner → next tool, stitched into one addressable turn. Named profiles sharpen the stitching for Claude Code, Codex, and Hermes; everything else falls back to a generic profile.

03

The honest trade-off

Why not an SDK, a proxy, or OpenTelemetry?

Approach In request path Needs client cooperation Sees full bodies Reconstructs agent turns
SDK instrumentation yes every client must yes every client must emit
Reverse proxy (LiteLLM…) yes clients point at it yes per-call only
OpenTelemetry from server yes server must emit ~partial if the server tags it
Heron no no yes 1 yes

1 TLS-terminated traffic only — Heron sees plaintext HTTP. Install it where the traffic is already decrypted: on the inference host, behind the TLS terminator, or fed by cloud-probe from a SPAN/TAP point. You give up cross-cluster client tracing; you get a single passive evidence chain that can't break the call when the observer fails, and that assembles the agent narrative for you.

04

Signals

Eight metrics, aggregated in sliding windows.

Per model and per route — the numbers ops, dev, and the business actually watch.

TTFT
Time to First Token
prompt arrival → first streamed token
E2E
End-to-End Latency
request → last token of the response
TPOT
Time Per Output Token
inter-token gap across the stream
RATE
Call Rate
requests per second, by route
TOK/s
Token Throughput
output tokens per second
ACTIVE
Active Calls
in-flight requests, right now
ERR
Call Error Rate
non-2xx & aborted streams
CACHE
Cache Hit Ratio
prompt-cache reuse, per model
05

Capabilities

What's in the box.

Ingress

  • libpcap on a live interfacezero-copy capture, BPF filters
  • replay from .pcap filesany speed · reproducible runs
  • ZMQ from cloud-probeSPAN/TAP hosts you can't install on

Providers

  • OpenAI chat · responses
  • Anthropic messages · SSE
  • Azure OpenAI & Gemini
  • vLLM · Ollama OpenAI-compatible local

Storage

  • DuckDB default · embedded · single file
  • PostgreSQL + TimescaleDB optional
  • ClickHouse columnar · high-throughput
06

Quickstart

Try it in 30 seconds — no live capture, no privileges.

  1. Grab a .pcap with LLM traffic

    Any capture that contains plaintext HTTP to an LLM endpoint — or capture live on an interface later.

  2. Replay it through Heron

    Point Heron at the file. --no-retention keeps it ephemeral — nothing is written to disk.

  3. Open the console

    Browse to localhost:3000 — turns, calls, and live metrics, organized around what the agent did.

terminal
# replay a capture — ephemeral, no privileges
heron --pcap-file capture.pcap --no-retention

# …then open the console
open http://localhost:3000
07

Datasheet

The facts.

LicenseApache-2.0
PlatformsLinux · macOS
CoreRust · Tokio · Axum
Parsingpcap · httparse · serde
Request pathnever in it
Client cooperationnone
Bodiesfull req & resp · post-TLS
ConsoleReact · localhost:3000