The Hole in the Puzzle

AI is a tool. Code is also a tool. Neither is the goal. Tools are great. They allow us to address problems in new ways, and sometimes even allow us to address entirely new problems.

The goal is realizing intent through systems. We have some notion of what we want a system to do, some constraints on how it should behave, some context about the environment it operates in. Code is one representation of a system that realizes intent. AI is one means of producing code. But the actual work, the thing that matters, is the mapping from intent to realization in a way that is sound and appropriate for the contextual constraints. This framing isn't novel. Fred Brooks distinguished essential complexity (the problem itself) from accidental complexity (the solution representation) decades ago. What's changed is that we now have tools capable of handling more of the accidental complexity, which means we can spend more attention on the essential.

Making AI better at coding matters. A lot of effort is going into that problem, and it's producing real results. But improving code generation is only part of the puzzle. The other major part is enabling humans and machines to collaborate on creating and refining intent, and on the verified derivation of software from that intent. Both matter. This post focuses on the second.

My goal is to enable humans and AI to collaborate in the construction and refinement of intent (specifications, context, constraints), and then derive software that satisfies the conditions of that intent, verified to operate within the parameters specified. This is different from the current mode where AI generates and humans review. It's a tighter loop, with shared artifacts and formal guarantees.

This post explores three conceptual areas that I've found promising in achieving this goal: Intent-Oriented Development, Coordinated Logical and Semantic Holes (CLaSH), and Shaped Generation. These are primitives: building blocks that compose with each other and with your existing systems and workflows.

Intent as Coordination Point

If the goal is realizing intent, then intent should be something we can work with directly. Not a comment that gets out of sync. Not a prompt that vanishes after generation. A structured artifact that can be refined, constrained, and verified against.

And critically: humans and machines can collaborate to create that intent in the first place.

This matters because articulating intent is itself difficult. What does the system need to do? What constraints apply? What constitutes success? These questions don't have obvious answers, and the process of answering them often reveals that we don't know what we want until we start trying to specify it. An AI collaborator that can ask clarifying questions, propose formalizations, identify ambiguities, and model unknowns gracefully is a force multiplier for specification work. Spec Driven Development is its own emerging category, with Kiro, Spec Kit, etc all helping users with the specification process. Kiro can add property tests to augment the spec, and SpecLang provides some bidirectional interaction, but none really deliver robust verifiable human-AI coordination, or elegant handling of unknowns.

What does intent include? At minimum:

Task kind and scope: What are we building? A function, a module, a service?
Logic and experience: What is the flow, and how is logic applied to that flow?
Constraints: Safety requirements, performance bounds, compatibility needs
Conventions: Architectural patterns, style preferences, library choices
Acceptance criteria: What would constitute success? Ideally something testable, possibly something provable
Unknowns: What don't we know yet? What needs to be refined? There's a design-shaped hole in this list, and that is a large discussion for another day.

When intent is explicit and structured, interesting things become possible. Humans can refine intent without touching code. Machines can propose realizations against explicit criteria. Verification can check whether a realization satisfies intent. And collaboration becomes about refining a shared artifact rather than reviewing generated output after the fact. I've been exploring this direction with Topos, a language for defining these boundaries, allowing formalization and the handling of unknowns.

This is what I'm calling Intent-Oriented Development. The term emphasizes that intent, not code, is the thing being developed. Code is derived. And the development of intent is itself a collaborative process between humans and machines. An Intent-Oriented workflow has a chance to displace current commit/PR-centric workflows.

Typed Holes and CLaSH

Suppose we have explicit intent (or we're in the process of developing it). How do humans and machines collaborate on deriving systems from that intent? And how do we handle the parts of intent that aren't yet fully specified?

This is where concepts from type theory become useful, so we'll take a sidequest into theory before extending to the direct application to our problem domain via CLaSH. You may have encountered typed holes or dependent types in Lean, Haskell, Adga, or Idris.

The Typed Hole

A typed hole is a placeholder in a program that has a known type and a known context (what variables are in scope, what their types are), but no implementation yet. The key property: a program with holes is still well-defined. You can typecheck it. You can reason about it. In systems like Hazel, you can even run it (the hole produces a "hole value" that propagates through computation). Lightweight analogy: Think of Sudoku. Each cell coordinates with the other cells in its row, column, and 3×3 square. When you fill in a cell, it provides information about what values are possible in those coordinated cells. The entire puzzle is well-defined even when most cells are empty. Filling one cell enables filling others, which enables filling still others, in a cascade of constraint satisfaction. Typed holes work similarly.

This "totality" property is what makes holes useful for collaboration. Traditional compilers halt on incomplete programs. They need everything defined before they can reason about anything. Typed holes change this: every partial program state has a well-defined meaning, so reasoning can proceed even when large parts of the system are unknown.

What a Hole Contains

A hole isn't just a blank space. It's a structured object that captures:

Expected type: What type must the fill have? (τ)
Typing environment: What variables are in scope, and what are their types? (Γ)
Constraints: What additional properties must the fill satisfy?

This environment capture is crucial. When you create a hole inside a function, the hole knows about the function's parameters, local variables, and any types in scope. When an AI (or human) fills that hole, the fill is checked against this captured context. The hole carries everything needed to validate a fill without requiring global analysis. The theoretical foundation here is Contextual Modal Type Theory (CMTT). In CMTT, hole closures are metavariable closures: they pair a metavariable (the unknown) with its context (what's known about its environment). This isn't just convenient notation; it's a logical system where reasoning about incomplete knowledge has rigorous semantics. The Curry-Howard correspondence gives us a logic for holes.

Fill-and-Resume

When a hole gets filled, the system doesn't start over. It validates the fill locally and continues:

Validate: Does the fill satisfy the hole's constraints?
Typecheck: Does the fill have the expected type in the captured environment?
Propagate: Update any dependent holes with new information
Continue: Resume from the hole boundary

This is "fill-and-resume" semantics. Each fill is a local proof obligation; if filled correctly, global properties are preserved. You don't re-verify the entire program. You check the fill against the hole's contract.

Bidirectional Information Flow

Type information flows in two directions through a program with holes:

Synthesis (↑): Infer the type of an expression from its structure ("this returns an int")
Analysis (↓): Check an expression against an expected type ("I need this to be a string")

Bidirectional typing propagates constraints both up and down the tree. If a hole has an expected type from its context (analysis mode), that type constrains what fills are valid. If a fill has a synthesized type, that information propagates to dependent holes.

This bidirectional flow is what enables holes to coordinate. A hole in an argument position knows the expected parameter type. A hole in a return position knows the declared return type. Type information from filled holes propagates to constrain unfilled holes. The system converges toward a consistent solution through local decisions with global effect.

Why This Matters for Collaboration

This machinery addresses several concrete challenges:

Non-halting reasoning: Human and machine can take turns filling holes without the system ever becoming undefined. There's no "compile error" state that blocks progress. Holes are first-class citizens, not errors.

Local verification: Each fill is checked against the hole's captured context. An AI can propose a fill; the system verifies it locally. Mistakes are caught at the hole boundary, not after the entire program is generated.

Progressive refinement: Filling a hole may create smaller holes. Coarse decisions (architecture) can be made before fine decisions (implementation). The collaboration proceeds through refinement rather than single-shot generation.

Explicit unknowns: When you don't know something, you represent it as a hole with constraints. The unknown is modeled, not ignored. As understanding develops, holes get refined or filled. The Hazel research group at Michigan has done foundational work here. Their OOPSLA 2024 paper "Statically Contextualizing LLMs with Typed Holes" demonstrates that providing type context from holes dramatically improves LLM code generation. In their MVUBench evaluation, the no-context baseline produced essentially no correct completions, while adding type definitions and relevant function headers increased performance by roughly 3x, with error-correction rounds providing an additional 1.5x multiplier on top of that. The paper says it directly: "AIs need IDEs, too."

Holes and Abstraction Levels

Holes can exist at varying levels of abstraction, even supporting hierarchy:

SYSTEM > MODULE > FUNCTION > BLOCK > STATEMENT > EXPRESSION > TOKEN

Coarser holes constrain finer holes. If you have a hole for a module, filling it with a module skeleton creates holes for each function. Filling a function hole creates holes for statements. And so on down to individual tokens.

This hierarchy enables progressive refinement. A human might specify a system-level hole ("we need an authentication service with these interfaces"). An AI might propose a module structure. A human might accept some functions and refine the intent for others. Each level of detail emerges from the level above, constrained by it.

CLaSH: Coordinated Logical and Semantic Holes

Different aspects of intent are expressible in different ways.

Some constraints are logical in the sense that they can be expressed in a type system or formal language. "This function returns a Promise<User>." "This value is non-null." These can be checked mechanically.

Other constraints are semantic in the sense that they require judgment or pattern recognition. "Use the existing authentication middleware." "Follow the error handling conventions established elsewhere." These are harder to formalize but still real constraints on what constitutes a valid solution.

Current tooling handles these separately. Type checkers know about types. LLMs know about patterns (sort of). But the type checker doesn't know that your architecture conventions exist, and the LLM doesn't always know that its output needs to satisfy the type checker.

Coordinated Logical and Semantic Holes (CLaSH) is the broader application of typed holes in this context: using holes not just for type-level reasoning, but as a unified coordination mechanism where both logical and semantic constraints can be expressed, composed, and satisfied together.

The Constraint Algebra

The mathematical foundation is a bounded meet-semilattice ⟨C, ⊓, ⊤, ⊥⟩:

⊤ (top) represents "no constraint" (everything allowed)
⊥ (bottom) represents "contradiction" (nothing allowed)
⊓ (meet) combines constraints via conjunction

The algebraic properties matter:

c ⊓ ⊤ = c                        (identity: unconstrained domains don't interfere)
c ⊓ ⊥ = ⊥                        (annihilation: contradictions propagate)
c ⊓ c = c                        (idempotence: re-applying constraints is safe)
c₁ ⊓ c₂ = c₂ ⊓ c₁                (commutativity: order doesn't matter)
(c₁ ⊓ c₂) ⊓ c₃ = c₁ ⊓ (c₂ ⊓ c₃)  (associativity: grouping doesn't matter)

These aren't abstract niceties. Identity means you can add new constraint domains without breaking existing ones. Annihilation means errors surface immediately rather than propagating silently. Idempotence enables caching. Commutativity and associativity enable parallel evaluation and incremental combination.

Both Types and Semantics Are Semilattices

Here's the key that makes CLaSH possible: both logical constraints (types) and semantic constraints have this same algebraic structure.

Types as a semilattice. This models types as having intersection (which languages like TypeScript support explicitly). Not all type systems admit clean greatest-lower-bounds, so this is a modeling choice rather than a universal property — but it captures the structure well enough to be useful. Consider a simple type system with subtyping:

⊤ = unknown (any value allowed)
⊥ = never   (no value possible)
⊓ = intersection (values satisfying both types)

If you have a hole expecting string | number and you add the constraint number | boolean, the meet is number (the intersection). If you have string and number with no overlap, the meet is never (contradiction). The semilattice structure captures how type constraints compose.

(string | number) ⊓ (number | boolean) = number
string ⊓ number = never = ⊥
T ⊓ unknown = T
T ⊓ never = never

Semantics as a semilattice. Semantic constraints also form a semilattice, though the elements are different:

⊤ = "no semantic requirement" (any pattern allowed)
⊥ = "contradictory requirements" (no pattern satisfies)
⊓ = conjunction of requirements

If you have a constraint "use async/await style" and add "follow existing error handling conventions," the meet is their conjunction. If you have "use callbacks" and "use async/await," that's a contradiction (⊥) in a codebase with incompatible conventions.

Semantic constraints aren't limited to evaluation by LLM judgment alone. Many can be auto-formalized into SMT-checkable propositions: "this value is always positive," "this function is called before that one," "this field is never null after validation." Others remain softer, requiring pattern matching or heuristics. The key is tracking confidence or ambiguity: a constraint might be "definitely satisfied," "definitely violated," "(un)certain with a certain confidence interval", or "uncertain, needs human review." This spectrum from mechanical verification to guided judgment lets semantic constraints be handled more robustly than pure LLM assessment, while the underlying algebra remains the same.

Why this matters. Because both domains have the same algebraic structure, they can be combined in a product domain where the algebra is preserved. This isn't a hack or an approximation, but a system that allows constraints from different domains to compose correctly. The partial order c₁ ⊑ c₂ ("c₁ is at least as constraining as c₂") is defined as c₁ ⊓ c₂ = c₁. This gives us a precise way to talk about refinement: when you fill a hole, the new constraint must be at least as constraining as the hole's original constraint. Refinement monotonically narrows the solution space.

Constraint Domains as Functors

Each constraint domain (eg, the domains I use in Ananke: syntax, types, imports, control flow, semantics) forms its own semilattice. But constraints don't exist in isolation; they depend on context. The same constraint means different things in different parts of a program.

We model each domain D as a functor from contexts to constraint semilattices:

D : Context → ConstraintSemilattice

This captures that a type constraint inside a generic function differs from the same constraint at module scope. The functor structure preserves the semilattice operations across context changes.

The Product Domain

The unified constraint system is the product of individual domains:

Ω(Γ) = Syntax(Γ) × Types(Γ) × Imports(Γ) × ControlFlow(Γ) × Semantics(Γ)

Meet operates component-wise: combining two unified constraints combines each domain's constraint separately. A unified constraint is ⊥ (contradiction) if any component is ⊥.

This is how logical and semantic constraints live in the same system. Type constraints occupy the Types component. Semantic constraints (patterns, conventions) occupy the Semantics component. They're different dimensions of the same space.

Cross-Domain Morphisms

Here's where coordination happens. Domains aren't independent; they influence each other. A type annotation implies certain imports. A control flow pattern implies certain type constraints.

These relationships are captured by constraint morphisms between domains:

φ : D₁(Γ) → D₂(Γ)

For the system to remain coherent, morphisms must be:

Monotonic: c₁ ⊑ c₂ ⟹ φ(c₁) ⊑ φ(c₂) (tighter constraints map to tighter constraints)
⊥-preserving: φ(⊥) = ⊥ (contradictions propagate across domains) Monotonicity is crucial. It means that when you refine a constraint in one domain, the induced constraints in other domains also refine (or stay the same). You never get a situation where tightening a type constraint loosens a semantic constraint. The system moves consistently toward a solution or toward contradiction, never sideways.

Concrete examples of cross-domain morphisms:

Source → Target	What it captures
Types → Imports	Type annotations require certain imports
Imports → Types	Available types depend on what's imported
ControlFlow → Types	Unreachable code has no type constraints
Types → Semantics	Type constraints become SMT formulas for verification
Semantics → Types	Pattern requirements induce type expectations

Unifying in CLaSH

With this structure, a hole carries a unified constraint from the product domain. It has type expectations (from Types), syntactic requirements (from Syntax), and semantic constraints (from Semantics). When an AI generates a fill, all components are checked. When a human adds a semantic constraint ("use the existing auth pattern"), it propagates via morphisms to constrain what types are valid.

The bidirectional information flow from typed holes combines with cross-domain morphisms: type information flows up and down the syntax tree, and constraint information flows across domains. The whole system converges toward a consistent solution or surfaces a contradiction.

This is what makes CLaSH more than just "typed holes plus LLMs." The constraint algebra provides a rigorous foundation for coordination. Logical and semantic constraints aren't handled by separate systems that might conflict. They're dimensions of a unified constraint that propagates coherently through holes and their relationships.

This is the direction explored in Ananke, which implements this constraint algebra across five domains and compiles unified constraints to guide generation.

Shaped Generation Why shaped generation rather than constrained generation? Because there is a broader pattern beyond token masking or constraint application in autoregressive models, and I'm attempting to capture the larger space in a unified term. Maybe I'm just being ridiculous.

Given a hole with constraints, how do we generate a valid fill?

The naive approach: generate freely, then filter. Let the model produce whatever it wants, then reject invalid outputs. This works but is wasteful, and for complex constraints the rejection rate can be prohibitive.

A better approach: shape the generation process itself so that invalid outputs are never produced. Encode constraints into generation so that the valid space is the only space explored.

For autoregressive models (the dominant paradigm in use for code generation today), this means compiling constraints to token masks. At each step, determine which tokens would lead to valid continuations, and zero out the probability of everything else. The model can only sample from the valid space. Ananke-SGLang implements this approach, adding multi-domain constrained generation to SGLang's serving infrastructure. Constraints from syntax, types, imports, control flow, and semantics compile to token masks that fuse efficiently.

For diffusion models and flow-matching, the principle is similar but the mechanism differs. Constraints become energy functions that bias the denoising trajectory, or velocity field modifications that make valid regions into attractors. ASTRAL is a research proposal exploring this direction: treating code generation as constrained search through valid program space via AST-aware discrete flow matching with energy-based constraint guidance. I am extremely bullish on diffusion language models and flow-matching models (in hybrid architectures) for code generation, as they natively support coordinated information across multiple dimensions. There's a lot of exciting work being done here. Check out DiffuCoder and Mercury for a good introduction.

The unifying insight across paradigms: generation is search through a constrained space. The constraints come from intent (logical and semantic, composed via CLaSH). The generation mechanism (autoregressive, diffusion, flow-matching) is just the search strategy. Different strategies have different tradeoffs, but the fundamental framing is the same: use the refinement and satisfaction of holes, along with constraint programming techniques, to effectively shape what gets generated.

The Execution Substrate

Holes, constraints, and shaped generation need a mechanism to allow them to compose properly. Recursive Language Model (RLM) architectures provide a foundation for integrating these techniques while delivering significant value on their own. The MIT RLM paper demonstrates that context-centric decomposition outperforms traditional approaches for long-context tasks.

The core idea: instead of processing all context in a single forward pass, externalize context as variables that can be manipulated programmatically. The model gets a REPL. It can examine, decompose, and recursively query context. It can fan out to other models or tools, and synthesize results appropriately.

This provides several capabilities:

Long context handling: Externalization of context as variables dramatically raises the context window ceiling
Deterministic tools: The REPL enables numerical analysis, state tracking, type checking, linting, and other deterministic operations
Context transformation: Slice, parse, enrich, and restructure context programmatically
Recursive decomposition: Fan out to the primary model (recursively) or to specialized models, synthesizing in one environment before responding
Stateful memory: Plug in hypergraph-based state tracking for evolving agentic memory, causal trajectory, detailed task plans, and other persistent data Not all of this is standard to RLMs, but reflect directions I've taken in Loop.

The system can intelligently inject this data into the REPL, reflect and refine its own actions, learn tool usage or action patterns, and plan more effectively over time.

For the patterns described earlier, this means:

Holes can exist as REPL state, persisting across interactions and refinement cycles
Type checkers, linters, and proof assistants can validate fills as deterministic tools
Recursive decomposition maps naturally to progressive hole refinement (coarse holes spawn finer holes)
Memory structures can track which holes spawned which sub-holes, maintaining coherence across long interactions

Toward Verification

There's a gap between "satisfies constraints" and "correct."

Type checking catches a class of errors. Constraint satisfaction catches more. But neither proves behavioral properties. "This function returns a sorted list" isn't expressible in most type systems.

Formal verification with proof assistants like Lean can close this gap. A specification becomes a theorem. An implementation becomes a proof. The proof assistant verifies that the proof is valid. DeepMind's AlphaProof achieved silver-medal performance on IMO 2024 by combining LLM generation with Lean verification. The formal language provides what informal reasoning cannot: certainty that the proof is correct. No hallucination risk in the verification step because Lean is deterministic.

The connection to typed holes: a hole with a formal specification is a proof obligation. Filling the hole means discharging the obligation. The LLM can propose fills; Lean verifies them. Generation remains probabilistic, but verification is deterministic.

Coupling RLM systems with a first-class Lean REPL enables the system to build formalizations and proofs to accompany specifications, plans, and even algorithmic models. It provides a domain in which to construct high-quality verifiers. The system can iterate between generation and verification, using verification failures to guide further generation.

I've been making significant progress on building Lean into my RLM systems as a first-class capability. I'm currently working on improving the agentic logic and interop, but the integration points are now concrete, allowing me to focus on efficacy and efficiency.

Model Orchestration

If an RLM can fan out to and orchestrate many models and tools, that provides an excellent way to take advantage of models optimized for specific tasks or domains. A general-purpose model for coordination and synthesis. Specialized models for code generation, proof search, constraint solving, and domain-specific tasks. Higher efficacy and efficiency through appropriate routing.

Specialized models often require pre/post training, fine tuning, distillation, quantization, merging, or a number of other operations, in conjunction with domain-specific data. Managing this varied workflow requires a systematic approach: programmable, repeatable, high-quality, effective, and efficient, yet modular enough to accommodate new techniques as the landscape continues to evolve.

This is the model build system problem. It's not the focus of this post, but it's a necessary piece of the larger vision. Future posts will explore it in more depth.

What Exists, What's Emerging, What's Needed

What exists and works:

Constrained decoding for autoregressive models (Ananke, llguidance, Outlines)
Typed holes improving LLM completion (Hazel research)
RLM architectures for long context and tool integration (MIT RLM)
Model and tool orchestration (NVIDIA ToolOrchestra)

What's emerging:

CLaSH coordination between logical and semantic constraints
Proof assistant integration for verification (Lean REPLs in RLM systems)
Constraint-driven diffusion and flow-matching (ASTRAL)

What's needed:

Tooling that makes these patterns accessible to working engineers
Standards for semantic context (ChatLSP is a decent starting point)
More research on how constraints interact across generation paradigms

Onward

This post sketches the landscape I'm currently exploring. Subsequent posts will go deeper into each of the key areas (and a few bits that didn't make the editing cut, this time around).

Overall, I'm hoping that I can start to paint a clear picture of a valuable new way of doing things, and some elegant yet pragmatic ways of getting there. The goal is to provide new capabilities that integrate with existing workflows and systems, enabling people and machines to collaborate on verified software.

As always, the outcome is the point. AI and code generation are just tools.

Let's empower people to create the outcomes they want.