Working with Intent

TL;DR

Previous posts described some primitives: typed holes, CLaSH constraints, shaped generation. This post is about what happens when you use them, and why the result is closer to a tool for thought than a spec tool. Decisions get a medium with formal properties. The medium participates in the reasoning: it shows you what you implied, surfaces contradictions, and propagates consequences. The same artifact that captures your intent compiles to a structure that shapes code generation. I'm building this as RFLX, a reflexive development substrate for building verified software through human-machine collaboration on typed holes.

We have languages for code, tools for notes, but we lack the right tools to support the decisions that precede both. The tools-for-thought movement (Roam, Obsidian, Logseq) gives us better media for information: bidirectional links, graph views, associative retrieval. Organizing and reasoning are different activities, though. A graph of linked notes doesn't propagate consequences, detect contradictions, or show you what you haven't decided yet. Linus Lee puts it well: we think with notations, not just in them. The design of the notation determines the ceiling of the thinking.

Programming languages let you express systems precisely. Type systems verify properties of those expressions. Version control tracks how they change. For the thinking that precedes all of this (what should the system do? what constraints apply? what are the tradeoffs?) we have prose documents, ticket descriptions, and the inside of our heads. The gap between "what I want" and "what I can formally work with" is where most of the hard problems live.

Previous posts introduced typed holes as the primitive for structured unknowns, CLaSH as the constraint algebra that coordinates across logical and semantic domains, and shaped generation as the principle that constraints should shape what gets generated rather than filter it after the fact. Those posts were about the machinery. This one is about what the work feels like when decisions get a medium that computes with them.

Starting a Project

I'll use a hypothetical project called Weft as a running example: a music remix and mashup tool, native Apple app for iPhone, iPad, and Mac.

You could start by writing a paragraph:

Weft is a music remix, edit, and mashup tool, not for creating from scratch. Intuitive track and sample management, tempo/pitch adjustment, warping, effects. Import from Apple Music, local files, Soundcloud, YouTube, Spotify. Save as working projects and exported audio files.

The system structures this into an outline: five capability areas (Import, Manipulation, Effects, Project Management, Export), two cross-cutting concerns (Platform, Performance), children under each. Constraint chips extracted from your text attach to the relevant entries. "Remix, not from-scratch creation" as a semantic constraint on the root. "Native Apple, iPhone + iPad + Mac" as a hard constraint on Platform. Any decent spec tool can get this far.

What matters is what the system infers. By saying "remix, not creation from scratch," you committed to imported audio as the primary content pipeline. Every track starts as someone else's recording. Import is load-bearing in a way it wouldn't be for a GarageBand-style creation tool; the system marks it as structurally central, connected to the audio engine, the arrangement model, the UX. You might not have noticed this implication yet. The constraint algebra did. This is the kind of implication that surfaces in code review three weeks later as "wait, you built the import pipeline as a secondary feature?" I'd rather see it in the first five minutes.

By listing five import sources, you committed to five integration points with different APIs, auth models, and legal constraints. Apple Music gets a fact attached: "MusicKit, well-supported." Spotify gets a tension marker: "No audio extraction API. ToS restricts download for remix." You named these sources casually. The system shows you what each actually requires.

By saying "warp things so they fit together," you implied beat-matching, which implies BPM detection, which implies audio analysis. A suggested entry appears: "Audio Analysis (implied by warping)."

Three suggested decision points also appear, positioned where they belong, connected to what they'd affect. Target User ("Bedroom producer? DJ? Casual TikTok creator?") connects to 15 downstream entries: UX complexity, feature depth, audio quality defaults, export formats, performance budget. Audio Engine Architecture connects to Performance, Effects, Manipulation, and Platform. Arrangement Model ("Timeline? Clip grid? Loop-based?") connects to UX, Manipulation, and Project Management; it determines what the main screen looks like.

These are gaps shaped like the decisions they represent. Not questions in a wizard. Gaps with visible downstream connections, waiting for you to engage when you're ready. The difference between "the system asks you 20 questions" and "the system shows you 3 gaps" is the difference between a form and a map. I can navigate a map. A form makes me fill out row 7 before I've thought about row 3.

You fill Target User: "Prosumer bedroom producers. People who know what a DAW is but want something faster and more fun for remix work." Consequences cascade. UX complexity budget adjusts (mid-range: powerful but approachable). Audio quality defaults shift to 44.1kHz/16-bit with 24-bit option. Import source priorities reorder: Apple Music and local files become primary, SoundCloud secondary, YouTube and Spotify become stretch goals given the ToS tensions. Export formats narrow to WAV, AIFF, MP3, AAC. A suggested latency constraint appears on Performance: "< 10ms for real-time monitoring."

If that's wrong, you undo. Every mutation is a typed event, and undo operates on decisions, retracting both the decision and its consequences. Try "casual creators" instead: UX simplifies, but a tension appears between "warp things to fit together" (a prosumer feature) and a casual audience. The system surfaces the tension. You choose. Safety to think out loud. That's the property I care about most. Exploring a decision's implications without committing to it is what makes the event-sourced architecture worth its complexity.

The Structure Talks Back

Something happened in the walkthrough above that's easy to miss if you focus on the product mechanics. You expressed intent. The structure showed you implications you hadn't considered. You made a decision. The structure responded with consequences across fifteen entries. You evaluated those consequences. Some were wrong, so you retracted and tried again. The structure reconfigured around a different answer and surfaced a new tension. Engelbart's 1962 framework argued that the means by which people manipulate symbols directly shapes their ability to think. Better symbol-manipulation tools don't just speed up existing thinking; they make new thinking possible. He was describing the general case. The specific case here: when decisions get a symbol system with formal properties, you can reason about tradeoffs and consequences that were previously trapped in your head.

That's a cognitive loop. Express, see, decide, watch, revise. Each step informed by the last. The structure does computational work that would otherwise happen in your head (or mostly wouldn't happen, because tracing the implications of a single decision across fifteen interconnected entries is beyond unaided cognition for most).

A document is passive; it holds what you put in it. A spreadsheet is active; it maintains relationships, propagates changes, detects errors. The structured decision environment is more like the spreadsheet than the document. It computes with your decisions. When the structure responds to a decision with consequences you didn't anticipate, that's what Schon called a "reflective conversation with the situation." The situation talks back. You adjust. It talks back again. This tight loop, where the medium participates in the reasoning rather than passively recording it, is what I think distinguishes a tool for thought from a tool for organization.

And the undo-and-retry cycle carries more cognitive weight than it might seem. When you retract "prosumer" and try "casual," you're not adjusting an output. You're questioning a governing assumption (who is this for?) and watching how the entire structure reconfigures around a different answer. The constraint structure makes your assumptions visible, inspectable, and retractable. You can see which decisions were load-bearing and which were incidental. The productive friction matters. Most tools are designed to minimize friction, for good reason. But surfacing contradictions, showing consequences, making unknowns visible: that friction is what triggers genuine reflection. The system should be smooth where decisions are clear and rough where they have implications you haven't considered. Dewey called this "felt difficulty": the precondition for reflective thought. The depth to which a sense of the problem sinks determines the quality of the thinking that follows. Tension markers and propagation events are manufactured felt difficulties.

Going Deeper

Weeks into the Weft project. You expand Effects and drill into Per-Track Effects. The detail panel shows a semantic constraint ("apply effects to all or some of a song"), a type constraint derived from the Audio Engine decision (AudioEffect -> ProcessedBuffer), and a suggested latency constraint propagated from the Performance budget. Three constraint domains covered. Two empty.

You add a hard constraint: "real-time processing, < 5ms per effect." The chip solidifies. A highlight settles onto the Performance entry higher in the tree, where a derived constraint landed. You didn't put it there. The algebra connected your per-effect requirement to the system-wide performance budget.

You invoke Fill. Code generates under the constraint set. Each constraint chip shows whether the generated code satisfies it. Type, import, and latency constraints pass. The semantic constraint scores lower: the generated code uses a slightly different routing model than what you described, satisfying the letter of the constraint but not quite the spirit. You accept anyway, and the system proposes fill-driven strengthening: the fill used a specific callback-based pattern, and that pattern becomes a candidate constraint on Master Effects and Effect Presets. You adopt two of the three. The specification got stronger from accepting code.

The Hole Thing: Specification and Generation Target

The intent structure you build is the same artifact that shapes code generation. When you fill a hole with enough constraints, the system uses exactly those constraints (type constraints, platform constraints, latency budgets, import requirements) to shape what the model produces. CLaSH compilationΩ(Γ) = Syntax(Γ) × Types(Γ) × Imports(Γ) × ControlFlow(Γ) × Semantics(Γ) compiles to token masks (AR models), energy functions (diffusion/flow-matching), or both. The human sees constraint chips gaining substance. The machine sees a tightening search space. Neither side needs to know about the other's representation.

Constrained Generation described how constraints compile to token masks. Hole in the Puzzle described how CLaSH unifies logical and semantic constraints. Energy, Structure, and Shaped Generation described how the same constraints become energy functions for non-autoregressive models. Working with intent is the experiential consequence: you're building a specification and a generation target simultaneously, because they are the same thing. Adding a type constraint to clarify what you want simultaneously narrows what the model can produce. Adding a semantic constraint ("use the existing authentication middleware pattern") fires cross-domain morphisms to constrain types, which constrain imports, which constrain the token mask.

The spec doesn't get stale because it governs generation directly. The generation doesn't drift because the spec is the constraint set. And when generated code comes back, it can teach the specification something. Accept a fill that uses a specific callback pattern, and the system proposes that pattern as a constraint on sibling entries. The fill strengthened the specification. Information flows both ways. This is the deepest departure from spec-driven development. In that paradigm, the spec is a document and the code is a separate artifact, and the two drift apart over time because nothing enforces their correspondence. Here, there is one artifact viewed from two angles. The spec angle shows you what you've decided. The generation angle compiles those decisions into guidance. Same data structure, different projections.

Different People, Same Structure

A product manager adds "users must share directly to social media" as a constraint on Export. An engineer, working two levels deeper, sees the effect: a suggested child entry (Social Media Export Formats) with derived constraints for video containers, codecs, and bitrate targets. The algebra translated the PM's sentence into engineering terms. The PM sees the engineer's work as increasing specificity; the export area is gaining substance. No type constraints visible from the product view. Just progress.

They work in the same structure. The constraint algebra doesn't care about provenance (though it is tracked and managed). It cares about consistency. A PM's semantic constraint has the same algebraic standing as an engineer's type constraint. The meet operation combines them. If they contradict, the contradiction surfaces immediately, not three weeks later when someone reads both the PRD and the technical design doc and notices the conflict.

When a contradiction does surface, the system does more than flag it. The contradiction itself is a gap in the structure: a resolution that needs to be filled. The system can show what's in tension (the PM's export requirement versus the engineer's codec constraint, say), which downstream entries are blocked by the conflict, and what resolution strategies are available (relax one constraint, narrow the other, or split into two entries with different scopes). The contradiction becomes a first-class decision point with its own downstream connections, not a red flag that someone has to triage in a meeting. Progressive formalization makes multi-persona collaboration possible. The same entry carries natural language intent ("handles authentication"), a structured template with typed inputs and outputs, and optionally a Lean 4 theorem. The PM works at the first level, the engineer at the second. Each sees the rendering appropriate to their work. The constraint algebra operates on all levels identically.

The lines between these roles are blurring, too. A PM using AI can now prototype implementation-level constraints. An engineer can think in product terms and express intent at the semantic level. A domain expert can add constraints in their own vocabulary and watch the algebra translate them into engineering terms. The shared structure accommodates this because it doesn't enforce roles. It facilitates consistency. As AI continues to dissolve the boundaries between "product thinking" and "engineering thinking," a medium that makes both legible in the same artifact becomes more valuable over time.

Current coordination tools (Jira, Linear, GitHub Issues) organize activity: who is working on what, what text changed. They don't organize intent: what are we trying to achieve, and did we achieve it. The typed hole carries both. A code diff tells you what text changed. A fill tells you what intent was satisfied, under what constraints, with what verification evidence. I think fills can eventually replace PRs as the review unit, but "eventually" is doing a lot of work in that sentence, and the path from here to there is a work in progress with a lot of unknowns.

When the World Changes

Your collaborator pushes code. They implemented the local file import pipeline outside the intent structure.

The Import section updates. Local File Import gains a marker: external change detected. The type constraint chip shows a divergence. You specified ImportedTrack; they implemented Result<ImportedTrack, ImportError>. You were working on effects. You see the marker but don't stop. When you're ready, you look at the affected entry. Three options: update your constraint to match the code, flag the divergence, or absorb the richer type and let error-handling suggestions flow to the other import entries.

You absorb. Result<T, E> becomes your constraint. Error handling suggestions appear on the sibling entries. Your collaborator taught the specification something by writing code. This requires a bidirectional index between code symbols and constraint chips, which in turn requires the scope graph and tree-sitter infrastructure that Homer already builds.

Code changes can update the specification. Specification changes shape generation. That's the design goal: an environment and workflow unified by one coherent set of primitives.

Temporal Depth

Every mutation in the system is a typed, attributed, timestamped event. This isn't just an implementation detail; it's a design commitment that gives the intent structure a temporal dimension.

Six months in. A new engineer opens the project and engages the timeline. The outline animates through its history: entries appearing, constraints accumulating, the project gaining specificity over time. The target user decision and its downstream effects. The terms of service tension and its resolution. The audio engine fork and the rationale for Core Audio over AVAudioEngine. They filter to architectural decisions and see the fork points: the alternatives that were considered, the constraints that mattered, who decided, why. All in the same medium where the decisions were made.

For human users, this is onboarding that doesn't require a wiki. For agents, it's structured context that survives session boundaries. For auditing, it's a complete provenance chain from intent through constraint through fill through verification. For day-to-day work, it's undo that operates on decisions rather than text edits, retracting both a decision and its propagated consequences. Git log tells you what text changed. The intent timeline tells you what decisions were made, what they affected, and why they were revised. These are different questions with different answers. Both are useful; only one is typically available.

The open questions are about surface, not value. How should temporal navigation be exposed across different interfaces (CLI, IDE, native app)? What views are most useful for which tasks? How does an agent consume decision history without exhausting its context window? The event-sourced architecture supports all of this; the experience design is where the research is.

Onward

The tools-for-thought lineage has a consistent thread across sixty years: the medium shapes the reasoning. What's been missing is a medium for decisions. Not a document format. Not a note-taking tool. A medium where decisions have visible structure, computable consequences, and verifiable properties, the way mathematical notation gives quantitative reasoning those properties.

RFLX is my attempt to build that medium. The system should drive formalization itself: inferring structured constraints from natural language, proposing type annotations from accepted code, generating Lean specifications when the constraint set is rich enough. Nobody should have to write Lean to benefit from verification any more than you have to write SQL to benefit from a database. The constraint algebra, the typed holes, the formal verification: none of this should be visible unless you want it to be. A PM expressing intent in plain English and an engineer refining type constraints should both be working in the same medium, seeing it at the resolution that's useful to them. The algebra is infrastructure, not interface.

I'm close to having enough of the core workflow built to put this in front of people and test the hypotheses: that the reflective conversation is real, that the productive friction helps rather than hinders, that visible governing assumptions change how teams make decisions. That's the next milestone, and I'm looking forward to it. The right user-facing terminology is still unresolved. The codebase uses "hole" internally, but the Weft scenario above drifts between "entry" and "gap." Neither might be right for people who haven't read the typed holes literature. Naming things, as always, is one of the hard problems.