[R]
Back to blog

Agentic Resource Management

Infrastructure should be as malleable as code. When the primary consumer is a machine, the control plane needs to be a versioned, queryable, composable resource graph.

#ai#infrastructure#agents#systems
TL;DR

Classical PaaS and IaC break when agents are the primary consumers. The fix is a resource graph where fork, overlay, promote, and observe are first-class operations. The inner loop is delta-based: cheap overlays and copy-on-write branching with optimistic concurrency. The agent interface is two operations (search + execute) that keep token cost O(1) regardless of system complexity. Merge semantics for stateful resources are explicitly opt-in and strategy-bound. The system is honest about what's hard. The full design proposal expands from here.

We've agreed for a while that infrastructure should be as malleable as code. Humans are no longer the most frequent drivers of IaC, though, and that's changed the game.

When the primary consumer of infrastructure is a machine operating at machine speed, the control plane must become a queryable, versioned, composable resource graph: forking is cheap, observation is structural, and the path from experiment to production is explicit, auditable, and governed.

The goal is infrastructure that agents can safely explore at high speed, not infrastructure that has been optimized to much slower human usage.

Where Classical Infrastructure Breaks

Agentic development turns "write code and deploy it" into "run many tight experimental loops that must touch real systems at machine speed." I've watched this transition break classical PaaS and IaC along five axes simultaneously, and the failures aren't incidental. They're structural.

The latency budget collapses. Humans tolerate minutes between intent and feedback. Agents operate on seconds. A 10-minute environment spin-up isn't "slow." It's structurally incompatible with the control loop, forcing context loss and cost spikes. This mismatch is "fatal" for agent workflows, as I've seen firsthand.

Concurrency explodes. "One engineer, one branch" becomes "N agents, N sandboxes, in parallel, possibly against the same service." Infrastructure surfaces that were cold-path become hot-path overnight. 100 agents needing 100 full staging environments is economically cursed.

Reversibility becomes load-bearing. Agents will do dumb things. Good agents will do occasionally catastrophic things. Cheap checkpointing, instant revert, and tamper-evident provenance trails move from nice-to-have to structural requirement. Replit's agent safety story (snapshots plus dev/prod split plus constrained DB access baked into the substrate) is an early signal of what "safety by default" needs to look like.

Telemetry becomes part of the programming model. If an agent can't cheaply get structured runtime feedback (errors, traces, cost signals), it regresses to vibes. You need the feedback loop to be automatable, queryable, and keyed to the specific change that caused the signal. Dashboard archaeology doesn't cut it. The signal has to be keyed to the specific delta, not "something is broken somewhere in staging." An agent that can't distinguish its change's error rate from baseline noise will either churn or hallucinate confidence.

Context window cost constrains the control surface. Every infrastructure operation exposed to an agent as a separate tool consumes context tokens. A rich infrastructure API with dozens of typed operations can exhaust a significant fraction of the context window before the agent begins reasoning about its actual task. Cloudflare's Code Mode work demonstrates the scale of this problem: a naive MCP server for their 2,500-endpoint API would consume 1.17 million tokens; their Code Mode approach collapses it to roughly 1,000. The mathThat's a 1,170x reduction. The infrastructure control plane must be designed for O(1) token cost, not O(n) in the number of operations.

What follows is a design for a resource management system built for this reality. Not a retrofit of Terraform with an MCP wrapper, but a rethinking of what infrastructure primitives want to be when the primary consumer is a machine.

Design Principles

Before the primitives, let's quickly cover the principles that constrain them. These are the load-bearing decisions that shape everything downstream.

Delta-then-converge, not full-state-apply. The write path should be fast, append-only deltas against a typed resource graph. The read path should be a materialized view of current desired state. Periodically, the system compacts into a canonical snapshot. This is exactly like an LSM tree: optimized for write-heavy workloads with eventual read consistency. It's the fundamental departure from Terraform's plan/apply cycle, which is optimized for infrequent, human-paced, full-state reconciliation. Terraform is a compilation target, not a hot path. This isn't a slight. Compacted snapshots export to Terraform, Pulumi, or CloudFormation for portability and disaster recovery. But the resource graph is the live control plane; the compacted snapshot is the cold backup.

Fork and promote are first-class verbs. Branching an environment, overlaying a single service, promoting a change through lanes: these should be as natural as git branch and git merge, not hand-assembled from 47 YAML files and some hope.

Every resource is born observable. Telemetry is not something you attach after the fact. Every sandbox, overlay, branch, and promotion carries a telemetry stream from creation. Structured, queryable, keyed by change delta. If you can't observe it, it shouldn't exist.

Security is structural, not aspirational. Scope attenuation, budget envelopes, TTLs, and egress controls are properties of the resource graph itself, not policies bolted on after deployment. I've written extensively about this in the context of agentic credential architecture. The same principles apply here.

Compositional over monolithic. Small, well-typed primitives that compose cleanly beat large opinionated stacks that work until they don't. But composition must happen within a coherent type system, not via stringly-typed YAML references.

Honest about hard problems. Merge semantics for stateful resources are intractable in the general case. Suspend/resume doesn't work for all workloads. Cost prediction for non-deterministic workloads is approximate. The system's type system and API should make these limitations visible, not hide them behind optimistic abstractions that fail at runtime. If something is hard, say so in the types. Most infrastructure abstractions fail because they pretend hard things are easy. Then you discover the pretense at 3 AM.

O(1) agent interface cost. The token cost of interacting with the infrastructure control plane must be constant regardless of the number of primitives, resource types, or operations. Agents discover capabilities progressively through schema search, not by loading the full type system into context.

The Primitives

The system is built on eight composable primitives. Each is a typed node in the resource graph with well-defined lifecycle, security scope, and telemetry. I'll focus on the ones that carry the most novelty.

Sandbox

A Sandbox is an isolated execution boundary: the atom of compute. It is not "a namespace" or "a container." It is a first-class object with a compute envelope, declared isolation level, default-deny networking, scoped credentials, a telemetry stream, and a TTL with a cost budget.

sandbox "agent-task-1247" {
  isolation    = "microvm"
  compute      = { cpu: 2, memory: "4Gi", gpu: null }
  ttl          = "30m"
  budget       = { max_cost: "$0.50" }
  network      = allow_egress(["api.stripe.com:443", "*.internal.svc"])
  credentials  = inherit(agent.scope, restrict_to(["db:read", "cache:rw"]))
  suspendable  = true
}

Sandboxes are born mortal. Every sandbox has a TTL and a cost budget. Exceeding either triggers graceful shutdown, not silent continuation. Durability is opt-in and explicit. Isolation levelsIsolation is a declared property, not an implementation detail. Ranges from shared-namespace (cheapest, weakest, under 1s cold start) through container (1-3s) to microvm (Firecracker-class, under 200ms pre-warmed). GPU and edge substrates are first-class, not afterthoughts bolted onto a container-only model.

Credentials follow the workload identity model I described in the credential architecture post: SPIFFE/SPIRE identities, just-in-time token issuance, scope that can only attenuate through delegation. An agent-spawned sandbox inherits at most the intersection of the spawning agent's scope and the sandbox's declared needs.

Overlay

An Overlay is a partial replacement of a baseline environment: the mechanism that makes "forking" cheap without duplicating everything.

overlay "test-new-checkout" {
  baseline = env.production@v("abc123")
  replace  = {
    "checkout-service" = { image: "checkout:experiment-42", config: { feature_x: true } }
  }
  route_when = overlay_selector("experiment-42")
  ttl        = "2h"
}

This is the primitive that makes the "100 agents, 100 experiments" scenario economically viable. You don't clone the world. You overlay the delta and route selectively. Everything not overlaid falls through to the baseline. Overlay selectors are signed tokens, not raw headers. Injected only at trusted ingress, propagated only to declared participants, stripped at egress boundaries. If you can spoof an overlay header, you can route production traffic through experimental code. The routing trust model prevents this.

Overlays compose. You can overlay an overlay. Resolution order is explicit and deterministic. Cross-cutting overlays can target multiple services simultaneously for testing coordinated changes.

Baseline drift policy is explicit:

PolicyBehaviorUse Case
freezeBaseline locked at fork timeExperiments, statistical comparisons
auto-rebaseBaseline tracks headDevelopment, feature branches
invalidateOverlay flagged stale if baseline movesCompliance-sensitive changes

Branch

A Branch is a copy-on-write fork of a stateful resource: the mechanism for safely experimenting with data, queues, caches, and configuration without touching the source.

Cost model is copy-on-write. A branch costs nearly nothing at creation and accumulates cost proportional to divergence. This is the same economics that makes git branching viable: the expensive thing is divergence, not the fork itself.

Branching is easy. Merging is the dragon. The system is deliberately honest about this.

promote(branch -> parent) means "this branch becomes the new truth." The parent is replaced. This is the common case and the safe default. No conflict resolution needed. Pick a winner, and gracefully manage things like draining before shutdown, managing dual writes, replaying from write ahead log, etc, so the cutover is clean. For live production databases, promote almost always has sharp edges.

The system addresses this through declared promotion prerequisites:

PrerequisiteBehavior
quiesceParent enters read-only mode before promote
replay(source)Replay parent's WAL from fork point post-promote
reconcile(fn)Run user-supplied reconciliation function
accept_lossExplicitly acknowledge interim writes will be discarded

Default: none. The system refuses to promote a branch whose parent has diverged since fork unless a prerequisite is declared.

merge(branch, target, strategy) is explicitly opt-in. A resource must declare a merge strategy to support merge at all. If no strategy is declared, the API rejects merge at call time, not at runtime. Available strategies: schema_only, append_only, last_writer_wins, custom(fn). If a resource type doesn't fit any of these, the answer is promote. Pick a winner, not a blend. This is intentional. Merging stateful resources in the general case is intractable, and hiding that behind a hopeful API leads to runtime surprises. If something is hard, say so in the types.

The Rest, Briefly

Environment. A named, versioned composition of sandboxes, overlays, and branches. The unit of "what is deployed." Unlike traditional environments, these are cheap, programmatic, and potentially very numerous. Environments enforce dev/prod parity structurally.

Promotion Lane. The explicit path a change takes from experiment to production, with gates (policy checks, test suites, human approval, SLO validation) at each stage. "Human-on-the-loop" rather than "human-in-the-loop." Engineers define the policy. The system enforces it. The agent doesn't need permission to try; it needs to pass the gates.

Telemetry Stream. A structural component of every resource, not a monitoring add-on. Built on OpenTelemetry with a layered cardinality model: traces are high-cardinality with short retention, metrics are low-cardinality with long retention, and exemplar links connect them for drill-down.

Resource Graph. The queryable, versioned representation of all infrastructure state. Maintains desired state, actual state, lineage, telemetry index, policy state, and cost state. Supports concurrent access from N agents via optimistic concurrency control with typed conflict detection and idempotency keys.

Experiment. A structured primitive built on overlays, branches, telemetry, and promotion lanes. Version-pinned baselines, automated evaluation criteria, and auto-resolve rules. Not "use those things manually and hope for the best," but a first-class coordination object.

The Agent Interface

The eight primitives and the resource graph constitute the system's internal architecture. The agent interface is how an AI agent actually interacts with it, and it is subject to a hard constraint that shapes everything: context window cost.

A naive approach (one MCP tool per operation, with full typed schemas) would consume tens of thousands of tokens before the agent even begins reasoning. Cloudflare's Code Mode work demonstrates both the problem and the solution: two meta-operations that compose the full primitive set through code. This is the same progressive disclosure principle I use everywhere: don't front-load what can be discovered on demand.

search(code): Progressive Schema Discovery

The agent writes code against the resource graph's type schema to discover available resources, operations, and constraints. The full schema never enters the context window.

// "What databases in staging support branching?"
search(async (schema) => {
  return schema.resources
    .where({ type: "database", environment: "staging" })
    .select(["name", "provider", "supports_branch", "branch_strategies"])
})

Read-only. Runs in a V8 isolate with no side effects.

execute(code): Operation Chaining as Plan

The agent writes code that chains multiple graph operations into a single execution plan. More token-efficient than separate tool calls, reduces round-trip latency, and lets the agent express intent as a coherent plan rather than disconnected steps.

execute(async (graph) => {
  const branch = await graph.branch({
    source: "db.staging.users",
    fork_point: "now",
    ttl: "2h"
  })

  const sandbox = await graph.sandbox({
    isolation: "microvm",
    compute: { cpu: 2, memory: "4Gi" },
    ttl: "1h",
    credentials: restrict(agent.scope, { [branch.id]: "rw" })
  })

  const overlay = await graph.overlay({
    baseline: "env.staging@head",
    replace: { "user-service": { image: "user:migration-v3" } },
    route_when: overlay_selector("migration-test")
  })

  const run = await graph.run(sandbox.id, {
    command: "python run_migration.py && python run_tests.py",
  })

  return { branch, sandbox, overlay, run }
})

Operations run sequentially by default, fail-fast with partial results. Where transactional semantics are needed, the agent wraps operations in graph.atomic([...]). The execute sandbox enforces three hard ceilings on the hot path: CPU time limit (5s default), operation count limit (100 mutations per block), and output size limit (1MB). The gas meterThe operation count limit prevents a hallucinating agent from issuing while(true) { graph.sandbox({...}) } and saturating the write path before the async budget controller notices. Same V8 isolate model Cloudflare uses for Workers.

Why Code-as-Plan

Three reasons.

Composability. "Branch, sandbox, overlay, test, evaluate" should be a single coherent plan, not five disconnected tool calls with five round trips. The plan carries its own logic: conditionals, error handling, data flow between steps.

Progressive disclosure. The agent doesn't need the full type system of all resource types upfront. It searches for what's relevant to the current task. "What databases support branching?" is a runtime query rather than a context-window tax.

Future-proofing. When the system adds new primitives, no MCP tool definitions need updating. The agent discovers them through search and uses them through execute. The protocol surface is stable even as capabilities grow. This is a crucial property for a system that will evolve faster than its consumers can update.

What Makes This Hard

It's crucial to acknowledge the hard problems this design doesn't fully solve. Hiding them behind optimistic framing is how you get runtime surprises.

Stateful merge semantics. Branching databases is elegant; merging them is often gnarly in the general case. Even promote has sharp edges: the transaction gap between fork and promote means interim writes are lost unless the promotion prerequisite is declared. The system refuses to promote silently, but defining the right prerequisite for a given domain remains the user's problem.

Cross-environment consistency during promotion. Promoting a change that spans multiple services, databases, and configuration requires atomicity across all of them. That's a distributed systems problem with no general solution. The system can provide two-phase promote, but failures during promotion require explicit rollback strategies.

Async overlay isolation. The routing trust model works cleanly for synchronous request/response. Async boundaries (Kafka, SQS, cron jobs, webhooks) require explicit enrollment and propagation configuration. The system provides the mechanisms, but correctly identifying all async boundaries in a complex service graph is a human mapping problem. The conservative default (strip overlay selectors at undeclared async boundaries) prevents contamination of baseline state but can cause experiments to produce incomplete results. Silent contamination is worse than missing data, so the system defaults to safety.

Cost prediction for non-deterministic workloads. Agent workloads are inherently unpredictable. Budget envelopes provide ceilings, but predicting cost before execution remains approximate. You can bound it; you can't predict it.

Trust calibration. How much autonomy should an agent have? Too little and you've built a very expensive approval queue. Too much and you're one prompt injection away from deleting production. The promotion lane model provides the framework, but tuning the gates is a human judgment problem, not a systems design problem.

Code Mode safety at scale. The search + execute interface runs agent-generated code in sandboxed isolates. Resource limits are per-block, not per-session. An agent that issues many sequential execute blocks, each under the operation limit, can still accumulate significant resource creation over time. The policy layer and budget controller must work in concert. The gap between "each block is individually bounded" and "the aggregate behavior is bounded" requires both mechanisms.

The bootstrap problem. The path from "existing Kubernetes cluster with Terraform" to "fully realized agentic resource graph" needs a pragmatic migration story. Start with sandboxes and overlays as the beachhead, add branching and the full graph incrementally. But the migration cost is real and shouldn't be underestimated.

A Phased Approach

To avoid building "Kubernetes 2: now with more feelings," the implementation is staged.

Phase 1: The Inner Loop. Sandbox primitive with TTL, budget, scoped identity, and telemetry. Overlay primitive with the full routing trust model. Queryable resource graph with optimistic concurrency control. Agent interface via search + execute. Promotion lanes with policy gates. This proves: an agent can fork, modify, observe, and promote a change safely and cheaply.

Phase 2: Stateful Branching. Branch primitive for core stateful resource types with promote and rebase. Experiment primitive with version-pinned baselines and automated evaluation. This proves: an agent can safely experiment with data, not just code.

Phase 3: Multi-Substrate and Scale. Substrate registry and placement policy for GPU, edge, and multi-cloud. Suspend/resume. Branch generalization to queues, caches, and configuration. Compaction for compliance and DR.

What Comes Next

The Code Mode pattern (two-tool, code-as-plan interfaces) will likely become standard practice for infrastructure MCP servers. Cloudflare's work, Anthropic's programmatic tool calling, and Block's Goose implementation all point in the same direction: agents interacting with infrastructure through typed code executed in sandboxed isolates, not through exhaustive tool enumerations.

Branching will move up the stack from databases to queues, caches, and configuration. IaC will bifurcate formally into inner loop (deltas, overlays, fast reconcile) and outer loop (compacted snapshots, compliance, DR). Terraform remains useful as a compilation target.

Longer-term, the resource graph becomes the primary interface for infrastructure. Not kubectl, not the AWS console, not Terraform CLI. A queryable, version-controlled graph that humans and agents both interact with through typed APIs. Infrastructure stops being "something you configure" and becomes "something you query and evolve." More like a database than a control panel.

The primitives described here are designed for composability and extensibility. The specific syntax is illustrative. The goal is to identify the right abstractions. The implementation can vary. The full design proposal goes deeper on networking, security, autoscaling, multi-substrate placement, and the reconciliation engine.

What can't vary is the honesty. Branching is easy. Merging is hard. Any system that pretends otherwise will teach you the difference at 3 AM.

0❤++