Taming a Legacy Codebase with Claude: A Field Report on Refactoring, Race Conditions, and Technical Debt

Why This Article Exists

Every engineering team eventually inherits a codebase that has outgrown its original design. Features were shipped, deadlines were met, and somewhere along the way the foundations quietly cracked. Hardcoded secrets found their way into source control. async void started creeping into timer callbacks. Collections got shared across threads without locks. A comment saying // TODO: fix this properly quietly turned into a permanent resident.

Continue reading →

Reverb: A Semantic Cache That Knows When Its Answers Go Stale

Caching LLM responses seems, at first glance, like a simple optimization. Record the prompt, record the answer, serve the answer next time the same prompt comes in. In practice it is a surprisingly deep problem, and the two standard approaches both fail in characteristic ways. Exact-match caches miss on anything short of a byte-identical prompt, which is almost never how users actually ask questions. TTL-based caches serve confidently-stale answers for hours after the underlying knowledge base has changed — the classic hallucination vector dressed up as “we cached it.”

Reverb is a Go library and standalone service that addresses both failure modes. It combines a two-tier cache (exact SHA-256 match, then embedding-cosine similarity) with knowledge-aware invalidation: every cached entry tracks the source documents it was derived from, and a change-data-capture pipeline evicts entries by causality when their sources change. TTLs become a backstop, not the primary correctness mechanism.

Continue reading →

MultiTrust: Subjective Logic as a Runtime for Multi-Agent Trust

In multiagent systems, trust of an agent is a valuable asset since it gives them an ability to reason about their future collaboration, coordination, and plan. Most “trust score” implementations in agentic systems are a single float between 0 and 1. That number is doing two jobs at once — representing how much positive evidence an agent has accumulated, and representing how confident the system is in that judgment — and it collapses them into a value that makes the two indistinguishable. A brand-new agent with no history and a seasoned agent that has run 10,000 tasks with an even win/loss record both land at 0.5. The scalar has no room to say “I don’t know yet.”

MultiTrust fixes this by reaching for the right math. It represents trust as a Subjective Logic opinion — a triple of (belief, disbelief, uncertainty) that sums to one — and exposes the whole machinery as an MCP server, so any Model Context Protocol-aware agent can consult it as a standard tool call.

Continue reading →

Tangle: Deadlock and Livelock Detection for LangGraph Agents

Multi-agent LLM workflows are, from a concurrency standpoint, small distributed systems. They hold resources, they wait on each other, and — like every other distributed system we have ever built — they can get stuck. The failure mode is worse than an outright crash: no exception is raised, no timer fires, no agent knows anything is wrong. The workflow just stops producing tokens. The operator sees a spinner.

Tangle is a small Python library that catches this class of failure in real time for LangGraph workflows (and, via OpenTelemetry, for anything else). It reuses an idea that has been sitting in operating-systems textbooks since 1972 — the Wait-For Graph — and applies it at the agent layer, where the same topology has quietly reappeared. To be specific, in its current implementation, Tangle provides repeated-pattern detection over message digests.

Continue reading →