[R]
Back to blog

RLM for Claude Code

A context compiler for agentic coding sessions.

Agentic coding sessions rot. Not catastrophically. Slowly. Four hours in, the model that was nailing architectural decisions starts fumbling variable names. Your context bloats, your history sprawls, and you find yourself nuking the session just to get back to baseline.

The RLM paper from Zhang, Kraska, and Khattab dropped in December. I built a Claude Code plugin around it. The paper frames RLM as "recursive language models." That's accurate but undersells it. The interesting move is treating RLM as a context compiler.

Context Compilation

An RLM transforms sprawling context into structured sub-problems. The model doesn't see 200k tokens of codebase and conversation history. It sees a handle to that context, with tools to slice, grep, partition, and recurse. The messy input gets compiled down to focused queries that actually fit in a reasoning window.

The plugin hooks this into Claude Code's lifecycle. When context crosses a complexity threshold or the query looks like it needs cross-file reasoning, RLM activates. The orchestrator decides depth, model routing, and tool access. Then it hands control to the outer model with your context loaded as an environment variable.

What Falls Out

Accurate tool use. LLMs are bad at counting, filtering, and pattern matching. The REPL is good at it. RLM lets the model delegate deterministic operations to code and reserve its token budget for actual reasoning. Grep for function definitions, filter by file type, count occurrences. Then think.

Multi-model routing. Not every sub-call needs your expensive model. The orchestrator can route simple decomposition tasks to Haiku or GPT-4o-mini, reserve Opus or GPT-5 for synthesis. The outer model plans the strategy; cheap models execute the pieces; the expensive model integrates results.

Synthesis over sub-queries. The outer model isn't just splitting work. It's synthesizing results from multiple inner calls into coherent answers. Ask about data flow across a codebase: the model partitions by module, traces dependencies across sub-calls, then integrates the findings. Closer to how you'd actually reason about the problem.

Long context without the rot. The whole point. No single call sees your full session history. The outer model maintains strategic awareness; inner calls get focused slices. Context stays fresh because each reasoning step operates on a digestible chunk.

Beyond the Paper

The Zhang et al. RLM is stateless. Each session starts fresh. For research benchmarks, that's fine. For actual agentic coding, it's a problem. You don't want to re-explain your architecture every Monday morning.

So the plugin adds a persistence layer: hypergraph memory with typed nodes (facts, experiences, procedures, goals) that evolve through tiers. Task memory consolidates to session memory, session promotes to long-term, stale knowledge decays to archive. The model can query this store mid-decomposition, grounding sub-calls in accumulated project context.

The other addition is reasoning traces. Every decomposition creates a decision tree: goals spawn decisions, decisions have options, options get chosen or rejected with recorded reasoning. When something goes wrong, you can actually see why the model partitioned the way it did, which sub-call returned garbage, where the synthesis went sideways.

There's also a strategy cache that learns from successful patterns. If a particular decomposition strategy worked for "trace data flow across modules," the system remembers and suggests it next time. Crude, but surprisingly useful. The model doesn't have to rediscover that grep-then-partition works well for cross-file reasoning.

Installation

claude plugins add-marketplace https://github.com/rand/rlm-claude-code
claude plugins install rlm-claude-code --marketplace rlm-claude-code-marketplace

You should see "RLM initialized" on next launch. /rlm status shows config. /rlm mode thorough for deep analysis, /rlm mode fast for quick iteration.

Rough Edges

Latency is real. Sub-calls are blocking, no prefix caching yet. Complex decompositions can take minutes.

Cost is unpredictable since it depends on how the model chooses to partition. I've capped recursion at depth 3 to avoid runaway spending. The activation heuristics are manual rules that mostly work. A trained classifier would be better. Async execution is on the list.

The Bet

RLM reframes context management as inference-time scaling. The model decides how to decompose; the scaffolding just enables the capability. Same pattern as chain-of-thought: give models structured ways to explore solution spaces and they figure out efficient strategies on their own.

For agentic coding, the fit is natural. Code is structured, grep-able, partitionable. Queries require cross-file reasoning. And the cost of context rot is concrete: wasted time, hallucinated bugs, lost architectural coherence.

If You're Not Using Claude Code

The plugin assumes you're already in Anthropic's ecosystem. If you want a standalone RLM environment with broader model access, I also built Recurse. It's a Go-based agent that extends Charmbracelet's Crush TUI with the same RLM orchestration and hypergraph memory, but routes through OpenRouter. Claude, GPT-4, Gemini, Llama, whatever. Same decomposition patterns, different plumbing.

github.com/rand/rlm-claude-code · github.com/rand/recurse