RLM for Claude Code

Agentic coding sessions rot. Not catastrophically. Slowly. Four hours in, the model that was nailing architectural decisions starts fumbling variable names. Your context bloats, your history sprawls, and you find yourself nuking the session just to get back to baseline.The fix for computers is also the fix for frontier AI: turn it off and on again. We've come so far.

The RLM paper from Zhang, Kraska, and Khattab dropped in December. I built a Claude Code plugin around it. The paper frames RLM as "recursive language models." That's accurate but undersells it. The interesting move is treating RLM as a context compiler.

Context Compilation

An RLM transforms sprawling context into structured sub-problems. The model doesn't see 200k tokens of codebase and conversation history. It sees a handle to that context, with tools to slice, grep, partition, and recurse. The messy input gets compiled down to focused queries that actually fit in a reasoning window.This is why RLM scales to 10M+ tokens without degradation. No single call sees the full context. The outer model becomes a manager: plans work, delegates execution, takes credit for results.

The plugin hooks this into Claude Code's lifecycle. When context crosses a complexity threshold or the query looks like it needs cross-file reasoning, RLM activates. The orchestrator decides depth, model routing, and tool access. Then it hands control to the outer model with your context loaded as an environment variable.

What Falls Out

Accurate tool use. LLMs are bad at counting, filtering, and pattern matching. The REPL is good at it. RLM lets the model delegate deterministic operations to code and reserve its token budget for actual reasoning. Grep for function definitions, filter by file type, count occurrences. Then think.

Multi-model routing. Not every sub-call needs your expensive model. The orchestrator can route simple decomposition tasks to Haiku or GPT-4o-mini, reserve Opus or GPT-5 for synthesis. The outer model plans the strategy; cheap models execute the pieces; the expensive model integrates results.Zhang et al. showed RLM with GPT-5-mini outperforming raw GPT-5 on long-context benchmarks while being cheaper per query. Turns out the secret to better AI is... more AI. But smaller. And recursive. We're through the looking glass here.

Synthesis over sub-queries. The outer model isn't just splitting work. It's synthesizing results from multiple inner calls into coherent answers. Ask about data flow across a codebase: the model partitions by module, traces dependencies across sub-calls, then integrates the findings. Closer to how you'd actually reason about the problem.

Long context without the rot. The whole point. No single call sees your full session history. The outer model maintains strategic awareness; inner calls get focused slices. Context stays fresh because each reasoning step operates on a digestible chunk.

Beyond the Paper

The Zhang et al. RLM is stateless. Each session starts fresh. For research benchmarks, that's fine. For actual agentic coding, it's a problem. You don't want to re-explain your architecture every Monday morning.

So the plugin adds a persistence layer: hypergraph memory with typed nodes (facts, experiences, procedures, goals) that evolve through tiers. Task memory consolidates to session memory, session promotes to long-term, stale knowledge decays to archive. The model can query this store mid-decomposition, grounding sub-calls in accumulated project context.The memory architecture is heavily inspired by HGMem. The insight that N-ary relationships beat pairwise edges for knowledge representation turns out to matter a lot for code understanding.

The other addition is reasoning traces. Every decomposition creates a decision tree: goals spawn decisions, decisions have options, options get chosen or rejected with recorded reasoning. When something goes wrong, you can actually see why the model partitioned the way it did, which sub-call returned garbage, where the synthesis went sideways.The trace structure borrows from deciduous. Trey's framing of decision graphs as first-class artifacts clicked for me.

There's also a strategy cache that learns from successful patterns. If a particular decomposition strategy worked for "trace data flow across modules," the system remembers and suggests it next time. Crude, but surprisingly useful. The model doesn't have to rediscover that grep-then-partition works well for cross-file reasoning.

Installation

claude plugins add-marketplace https://github.com/rand/rlm-claude-code
claude plugins install rlm-claude-code --marketplace rlm-claude-code-marketplace

You should see "RLM initialized" on next launch. /rlm status shows config. /rlm mode thorough for deep analysis, /rlm mode fast for quick iteration.

Rough Edges

Latency is real. Sub-calls are blocking, no prefix caching yet. Complex decompositions can take minutes."It's thinking" is the new "it's compiling." Somehow we've made waiting for computers respectable again.

Cost is unpredictable since it depends on how the model chooses to partition. I've capped recursion at depth 3 to avoid runaway spending. The activation heuristics are manual rules that mostly work. A trained classifier would be better. Async execution is on the list.

The Bet

RLM reframes context management as inference-time scaling. The model decides how to decompose; the scaffolding just enables the capability. Same pattern as chain-of-thought: give models structured ways to explore solution spaces and they figure out efficient strategies on their own.

For agentic coding, the fit is natural. Code is structured, grep-able, partitionable. Queries require cross-file reasoning. And the cost of context rot is concrete: wasted time, hallucinated bugs, lost architectural coherence.The real question is whether this is a bridge technology until context windows get good enough, or a long-term part of the stack. I'm betting on the latter. Decomposition is how humans handle complexity too.

If You're Not Using Claude Code

The plugin assumes you're already in Anthropic's ecosystem. If you want a standalone RLM environment with broader model access, I also built Recurse. It's a Go-based agent that extends Charmbracelet's Crush TUI with the same RLM orchestration and hypergraph memory, but routes through OpenRouter. Claude, GPT-5, Gemini, Llama, whatever. Same decomposition patterns, different plumbing.Two implementations of the same idea is usually a sign I should write a library. It's on the list.

github.com/rand/rlm-claude-code · github.com/rand/recurse