ASDLC.io
Article Compendium

Alphabetic Listing for Programmatic Access

Generated: 2026-01-19

This is an alphabetically-sorted compilation of all ASDLC articles (Concepts, Patterns, Practices) in a single page. This resource is optimized for bulk download, scraping, or feeding to LLMs for comprehensive analysis.

Unlike the Field Manual, which is organized for readability and follows a curated structure, this compendium is intentionally unstructured—a raw alphabetic listing for programmatic consumption.

Table of Contents

Concepts

Patterns

Practices

Concepts (A-Z)

Agentic SDLC

Framework for industrializing software development where agents serve as the logistic layer while humans design, govern, and optimize the flow.

Status: Live | Last Updated: 2026-01-01

Definition

The Agentic Software Development Life Cycle (ASDLC) is a framework for industrializing software engineering. It represents the shift from craft-based development (individual artisans, manual tooling, implicit knowledge) to industrial-scale production (standardized processes, agent orchestration, deterministic protocols).

“Agentic architecture is the conveyor belt for knowledge work.” — Ville Takanen

ASDLC is not about “AI coding assistants” that make developers 10% faster. It’s about building the software factory—systems where agents serve as the architecture of labor while humans design, govern, and optimize the flow.

The Industrial Thesis

Agents do not replace humans; they industrialize execution.

Just as robotic arms automate welding without replacing manufacturing expertise, agents automate high-friction parts of knowledge work (logistics, syntax, verification) while humans focus on intent, architecture, and governance.

In this model:

The Cybernetic Model

ASDLC operates at L3 Conditional Autonomy—a “Fighter Jet” model where the Agent acts as the Pilot executing maneuvers, and the Human acts as the Instructor-in-the-Cockpit.

Key Insight: Compute is cheap, but novelty and correctness are expensive. Agents naturally drift toward the “average” solution (Regression to the Mean). The Instructor’s role is not to write code, but to define failure boundaries (Determinism) and inject strategic intent (Steering) that guides agents out of mediocrity.

The Cybernetic Loop

The lifecycle replaces the linear CI/CD pipeline with a high-frequency feedback loop:

Mission Definition: The Instructor defines the “Objective Packet” (Intent + Constraints). This is the core of Context Engineering.

Generation (The Maneuver): The Agent autonomously maps context—often using the Model Context Protocol (MCP) to fetch live data—and executes the task.

Verification (The Sim): Automated Gates check for technical correctness (deterministic), while the Agent’s Constitution steers semantic intent (probabilistic).

Course Correction (HITL): The Instructor intervenes on technically correct but “generic” solutions to enforce architectural novelty.

Strategic Pillars

Factory Architecture (Orchestration)

Projects structured with agents as connective tissue, moving from monolithic context windows to discrete, specialized stations (Planning, Spec-Definition, Implementation, Review).

Standardized Parts (Determinism)

Schema-First Development where agents fulfill contracts, not guesses. AGENTS.md, specs/, and strict linting serve as the “jigs” and “molds” that constrain agent output.

Quality Control (Governance)

Automated, rigorous inspection through Probabilistic Unit Tests and Human-in-the-Loop (HITL) gates. Trust the process, not just the output.

ASDLC Usage

Full project vision: /docs/vision.md

Applied in: Specs, AGENTS.md Specification, Context Gates, Model Routing

Behavior-Driven Development

A collaborative specification methodology that defines system behavior in natural language scenarios, bridging business intent and machine-verifiable acceptance criteria.

Status: Live | Last Updated: 2026-01-13

Definition

Behavior-Driven Development (BDD) is a collaborative specification methodology that defines system behavior in natural language scenarios. It synthesizes Test-Driven Development (TDD) and Acceptance Test-Driven Development (ATDD), emphasizing the “Five Whys” principle: every user story should trace to a business outcome.

The key evolution from testing to BDD is the shift from “test” to “specification.” Tests verify correctness; specifications define expected behavior. In agentic workflows, this distinction matters because agents need to understand what behavior is expected, not just what code to write.

Key Characteristics

From Tests to Specifications of Behavior

AspectUnit Testing (TDD)Behavior-Driven Development
Primary FocusCorrectness of code at unit levelSystem behavior from user perspective
LanguageCode-based (Python, Java, etc.)Natural language (Gherkin)
StakeholdersDevelopersDevelopers, QA, Business Analysts, POs
SignalPass/Fail on logicAlignment with business objectives
Agent RoleMinimal (code generation)Central (agent interprets and executes behavior)

The Three Roles in BDD

BDD emphasizes collaboration between three perspectives:

  1. Business — Defines the “what” and “why” (business value, user outcomes)
  2. Development — Defines the “how” (implementation approach)
  3. Quality — Defines the “proof” (verification criteria)

In agentic development, the AI agent often handles Development while Business and Quality remain human-defined. BDD provides the structured handoff format.

BDD in the Probabilistic Era

Traditional BDD was designed for deterministic systems: given specific inputs, expect specific outputs. Agentic systems are probabilistic—LLM outputs vary based on context, temperature, and emergent behavior.

BDD adapts to this by:

ASDLC Usage

BDD’s value in agentic development is semantic anchoring. When an agent is given a Gherkin scenario, it receives a “specification of behavior” that:

This is why BDD scenarios belong in Specs, not just test suites. They’re not just verification artifacts—they’re functional blueprints that guide agent reasoning.

Implementation via the Spec Pattern:

BDD ComponentSpec Implementation
Feature descriptionSpec Context section
Business rulesBlueprint constraints
Acceptance scenariosContract section (Gherkin scenarios)

Applied in:

Context Engineering

Context Engineering is the practice of structuring information to optimize LLM comprehension and output quality.

Status: Live | Last Updated: 2026-01-12

Definition

Context Engineering is the systematic approach to designing and structuring the input context provided to Large Language Models (LLMs) to maximize their effectiveness, accuracy, and reliability in generating outputs.

The practice emerged from the recognition that LLMs operate on explicit information only—they cannot intuit missing business logic or infer unstated constraints. Context Engineering addresses this by making tacit knowledge explicit, machine-readable, and version-controlled.

While ASDLC focuses on software development, Context Engineering is domain-agnostic:

Anywhere agents operate, context is the constraint that turns raw intelligence into specific value.

Martin Fowler observes: “As I listen to people who are serious with AI-assisted programming, the crucial thing I hear is managing context.”

Anthropic’s research confirms this. Engineers cite the cold start problem as the biggest blocker:

“There is a lot of intrinsic information that I just have about how my team’s code base works that Claude will not have by default… I could spend time trying to iterate on the perfect prompt [but] I’m just going to go and do it myself.”

Context Engineering solves cold start by making tacit knowledge explicit, machine-readable, and version-controlled so agents can act on it without prompt iteration.

Key Characteristics

The Requirements Gap

“Prompt Engineering” is often a misnomer. It is simply Requirements Engineering applied to a non-human entity that cannot intuit missing business logic. Human developers ask clarifying questions when requirements are vague (“What happens if the payment fails?”). AI models build something based on probability. Errors generally surface only when the system breaks in production.

Core Attributes

  1. Version Controlled: Context exists as a software asset that lives in the repo, is diffed in PRs, and is subject to peer review.
  2. Standardized: Formatted to be readable by any agent (Cursor, Windsurf, Devin, GitHub Copilot).
  3. Iterative: Continuously refined based on agent failure modes and tacit information discovered by Human-in-the-loop (HITL) workflows.
  4. Schema-First: Data structures defined before requesting content generation to ensure type safety and validation.
  5. Hierarchical: Information organized by importance—critical instructions first, references second, examples last.

ASDLC Usage

In ASDLC, context is treated as version-controlled code, not ephemeral prompts.

Context vs Guardrails:

A distinction exists between Guardrails (Safety) and Context (Utility). Currently, many AGENTS.md files contain defensive instructions like “Do not delete files outside this directory” or “Do not output raw secrets.” This is likely a transitional state. OpenAI, Anthropic, Google, and platform wrappers are racing to bake these safety constraints directly into the inference layer. Soon, telling an agent “Don’t leak API keys” will be as redundant as telling a compiler “Optimize for speed.”

Relationship to Patterns:

Applied in:

[!NOTE] Research Validation (InfiAgent, 2026): File-centric state management outperforms compressed long-context prompts. Replacing persistent file state with accumulated conversation history dropped task completion from 80/80 to 27.7/80 average, even with Claude 4.5 Sonnet. This validates treating context as a reconstructed view of authoritative file state, not as conversation memory.

Coverage Metric

Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.

Status: Draft | Last Updated: 2026-01-10

Definition

Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.

Unlike quality metrics (correctness, style, performance), coverage answers: “Did it finish the job?”

Key Characteristics

Formula

Coverage = (Completed Units / Total Required Units) × 100

Where “units” are task-appropriate:

Why Coverage Matters

Quality metrics assume the agent attempted the work. But long-horizon agents often fail silently:

Coverage catches these failures that quality metrics miss.

Measurement

Report three values across multiple runs:

High variance (large gap between max and min) indicates unreliable architecture, even if max is perfect.

ASDLC Usage

Coverage is particularly relevant for:

Consider adding coverage assertions to Quality Gates for batch operations.

Event Modeling

A system blueprinting method that centers on events as the primary source of truth, serving as a rigorous bridge between visual design and technical implementation.

Status: Experimental | Last Updated: 2026-01-01

Definition

Event Modeling is a method for designing information systems by mapping what happens over time. It creates a linear blueprint that serves as the single source of truth for Product, Design, and Engineering.

Unlike static diagrams (like ERDs or UML) that focus on structure, Event Modeling focuses on the narrative of the system. It visualizes the system as a film strip, showing exactly how a user’s action impacts the system state and what information is displayed back to them.

Core Components

An Event Model is composed of four distinct elements:

Why It Matters for AI

In modern software development, ambiguity is the enemy. While human engineers can infer intent from a loose visual mockup, AI models require explicit instructions.

Event Modeling forces implicit business rules to become explicit. By defining the exact data payload of every Command and the resulting state change of every Event, we provide AI agents with a deterministic roadmap. This ensures the generated code handles edge cases and data consistency correctly, rather than just “looking right” on the frontend.

Relationship to Requirements

Event Modeling acts as a bridge between Visual Design (what it looks like) and Technical Architecture (how it works).

It does not replace functional requirements; rather, it validates them. A feature is only considered “defined” when there is a complete path mapped from the user’s view, through the command, to the stored event, and back to the view. This “closed loop” guarantees that every pixel on the screen is backed by real data.

Gherkin

A structured, domain-specific language using Given-When-Then syntax to define behavioral specifications that are both human-readable and machine-actionable.

Status: Live | Last Updated: 2026-01-13

Definition

Gherkin is a structured, domain-specific language using Given-When-Then syntax to define behavioral specifications in plain text. While Behavior-Driven Development provides the methodology, Gherkin provides the concrete syntax.

Gherkin’s effectiveness for LLM agents stems from its properties: human-readable without technical jargon, machine-parseable with predictable structure, and aligned between technical and non-technical stakeholders. Each keyword defines a phase of reasoning that prevents agents from conflating setup, action, and verification into an undifferentiated blob.

The Given-When-Then Structure

Gherkin scenarios follow a consistent three-part structure:

Feature: User Authentication
  As a registered user
  I want to log into the system
  So that I can access my personalized dashboard

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    And the user has password "SecurePass123"
    When the user submits login credentials
    Then the user should be redirected to the dashboard
    And a session token should be created

Keyword Semantics

KeywordTraditional BDDAgentic Translation
GivenPreconditions or initial stateContext setting, memory retrieval, environment setup
WhenThe trigger event or user actionTask execution, tool invocation, decision step
ThenThe observable outcomeVerification criteria, alignment check, evidence-of-done
And/ButAdditional conditions within a stepLogical constraints, secondary validation parameters
FeatureHigh-level description of functionalityFunctional blueprint, overall agentic goal
BackgroundSteps common to all scenariosPre-test fixtures, global environment variables

ASDLC Usage

Gherkin isn’t just a testing syntax—it’s a semantic constraint language for agent behavior.

When an agent reads a Gherkin scenario:

This partitioning prevents “context bleed” where agents conflate setup, action, and verification.

In Specs: The Spec Contract section uses Gherkin scenarios:

## Contract

### Scenarios

#### Happy Path
Given a valid API key
When the user requests /api/notifications
Then the response returns within 100ms
And the payload contains the user's notifications

Applied in:

Guardrails

Why we deprecated the term 'Guardrails' in favor of strict separation between deterministic Context Gates and probabilistic Agent Constitutions.

Status: Deprecated | Last Updated: 2026-01-01

⚠️ Deprecated: This concept has been superseded by Context Gates and Agent Constitution.

The Ambiguity Problem

In the broader AI industry, “Guardrails” has become a “suitcase word”—a single term packed with too many conflicting meanings. It conflates architectural firewalls (hard rules) with prompt engineering (soft influence).

This ambiguity leads to fragile systems where engineers try to fix logic errors with prompt tuning (which is unreliable) or restrict creativity with rigid code blocks (which is stifling).

Standard Definitions

Broadly, industry implementations of “Guardrails” typically fall into two buckets:

  1. Input/Output Filtering: Deterministic systems that intercept and block messages based on policy (e.g., NVIDIA NeMo).
  2. Behavioral Constraint: Probabilistic techniques (prompting/tuning) to prevent the model from deviating from its persona.

The ASDLC Interpretation

To resolve this ambiguity, we have deprecated “Guardrails” in favor of strictly separating the concept into two distinct mechanisms: The Brakes and The Driver.

1. Context Gates (The Brakes)

These are deterministic validation layers. Just as car brakes function regardless of what the driver “thinks,” Gates trigger regardless of the LLM’s intent.

2. Agent Constitution (The Driver)

These are probabilistic steering instructions. They are the training and rules the “driver” (LLM) carries in its head to make good decisions.

Comparison of Controls

FeatureContext GatesAgent Constitution
NatureDeterministic (Binary)Probabilistic (Semantic)
LocationExternal (Firewall/Code)Internal (Context Window)
GoalCorrectness (Prevent errors)Alignment (Steer intent)
Failure ModeException / RejectionHallucination / Bad Style
AnalogyThe BrakesThe Driver’s Training

Superseding Concepts

This concept has been superseded by:

See also:

Levels of Autonomy

SAE-inspired taxonomy for AI agent autonomy in software development, from L1 (assistive) to L5 (full), standardized at L3 conditional autonomy.

Status: Live | Last Updated: 2026-01-09

Definition

The Levels of Autonomy scale categorizes AI systems based on their operational independence in software development contexts. Inspired by the SAE J3016 automotive standard, it provides a shared vocabulary for discussing human oversight requirements.

The scale identifies where the Context Gate (the boundary of human oversight) must be placed for each level. Under this taxonomy, autonomy is not a measure of intelligence—it is a measure of operational risk and required human involvement.

The Scale

LevelDesignationDescriptionHuman RoleFailure Mode
L1AssistiveAutocomplete, Chatbots. Zero state retention.Driver. Hands on wheel 100% of time.Distraction / Minor Syntax Errors
L2Task-Based”Fix this function.” Single-file context.Reviewer. Checks output before commit.Logic bugs within a single file.
L3Conditional”Implement this feature.” Multi-file orchestration.Instructor. Defines constraints & intervenes on “drift.”Regression to the Mean (Mediocrity).
L4High”Manage this backlog.” Self-directed planning.Auditor. Post-hoc analysis.Silent Failure. Strategic drift over time.
L5Full”Run this company.”Consumer. Passive beneficiary.Existential alignment drift.

Analogy: The Self-Driving Standard (SAE)

The software autonomy scale maps directly to SAE J3016, the automotive standard for autonomous vehicles. This clarifies “Human-in-the-Loop” requirements using familiar terminology.

ASDLC LevelSAE EquivalentThe “Steering Wheel” Metaphor
L1L1 (Driver Assist)Hands On, Feet On. AI nudges the wheel (Lane Keep) or gas (Cruise), but Human drives.
L2L2 (Partial)Hands On (mostly). AI handles steering and speed in bursts, but Human monitors constantly.
L3L3 (Conditional)Hands Off, Eyes On. AI executes the maneuver (The Drive). Human is the Instructor ready to grab the wheel immediately.
L4L4 (High)Mind Off. Sleeping in the back seat within a geo-fenced area. Dangerous if the “fence” (Context) breaks.
L5L5 (Full)No Steering Wheel. The vehicle has no manual controls.

ASDLC Usage

ASDLC standardizes practices for Level 3 (Conditional Autonomy) in software engineering. While the industry frequently promotes L5 as the ultimate goal, this perspective is often counterproductive given current tooling maturity. L3 is established as the sensible default.

[!WARNING] Level 4 Autonomy Risks

At L4, agents operate for days without human intervention but lack the strategic foresight needed to maintain system integrity. This results in Silent Drift—the codebase continues to function technically but gradually deteriorates into an unmanageable state.

Mitigation strategies exist (Advanced Context Gates, architectural health monitoring), but these solutions require further validation.

[!NOTE] Empirical Support for L3

Anthropic’s 2025 internal study of 132 engineers validates L3 as the practical ceiling:

  • Engineers fully delegate only 0-20% of work
  • Average 4.1 human turns per Claude Code session
  • High-level design and “taste” decisions remain exclusively human-owned
  • The “paradox of supervision”—effective oversight requires skills that AI use may atrophy

Applied in:

Mermaid

A text-based diagramming language that renders flowcharts, sequences, and architectures from markdown, enabling version-controlled visual specifications.

Status: Live | Last Updated: 2026-01-13

Definition

Mermaid is a text-based diagramming language that renders flowcharts, sequence diagrams, and architecture visualizations from markdown-style code blocks. In agentic development, Mermaid serves as the specification language for processes, workflows, and system relationships.

Where Gherkin specifies behavior and YAML specifies structure, Mermaid specifies process—how components interact, how data flows, and how state transitions occur.

Key Characteristics

Text-Based Diagrams

Mermaid diagrams are defined in plain text, making them:

flowchart LR
    A[Input] --> B[Process]
    B --> C[Output]
Mermaid Diagram

Diagram Types

TypeUse CaseASDLC Application
FlowchartProcess flows, decision treesFeature Assembly, Context Gates
SequenceAPI interactions, message flowsService contracts, Integration specs
StateState machines, lifecycleComponent state, Workflow phases
ClassObject relationshipsDomain models, Architecture
EREntity relationshipsData models, Schema design
GanttTimeline, schedulingRoadmaps, Sprint planning

Subgraphs for Grouping

Subgraphs partition complex diagrams into logical regions:

flowchart LR
    subgraph Input
        A[Source]
    end
    
    subgraph Processing
        B[Transform]
        C[Validate]
        B --> C
    end
    
    A --> B
    C --> D[Output]
Mermaid Diagram

ASDLC Usage

Mermaid serves as the process specification language in ASDLC, completing the specification triad:

LanguageSpecifiesExample
GherkinBehaviorGiven/When/Then scenarios
YAMLStructureSchemas, configuration
MermaidProcessFlowcharts, sequences

Why Mermaid for Specs:

Text-based diagrams solve a critical problem in agentic development: visual documentation that agents can read, modify, and version-control. Unlike image-based diagrams that become stale context, Mermaid diagrams are:

Relationship to Patterns:

Anti-Patterns

Anti-PatternDescription
Box SoupToo many nodes without grouping
Arrow SpaghettiExcessive cross-connections
No LabelsEdges without descriptive text
Static ScreenshotsImages instead of text diagrams

[!TIP] Key practices: Group with subgraphs, label edges, use flowchart LR for process flows, limit to <15 nodes per diagram.

Model Context Protocol (MCP)

Open standard for connecting AI agents to data sources and tools, enabling real-time 'just-in-time' context vs. stale vector databases.

Status: Draft | Last Updated: 2025-11-25

The Model Context Protocol (MCP) is an open standard that functions as a universal connector between AI assistants and external systems. It standardizes how AI models interact with data repositories and business tools, effectively replacing fragmented, custom integrations with a single, unified protocol.

The “USB-C” for Artificial Intelligence

Think of MCP as a USB-C port for AI applications.

How It Works

Technologically, MCP operates on a Client-Host-Server architecture:

From Static to “Just-in-Time” RAG

While MCP is a critical enabler for Retrieval-Augmented Generation (RAG), it represents a fundamental shift in how agents access knowledge.

FeatureTraditional RAGMCP (Dynamic RAG)
Data SourcePre-indexed Vector DatabasesLive “Resources” & “Tools”
FreshnessSnapshots (Can become stale)Real-time (Source of Truth)
MechanismSemantic SearchDirect Query / Function Execution

By allowing the model to query a live SQL database or read the current state of a git repository at the exact moment of inference, MCP enables “Just-in-Time” intelligence. This removes the reliance on stale data dumps and allows agents to act on the absolute latest state of the world.

OODA Loop

The Observe-Orient-Decide-Act decision cycle—a strategic model from military combat adapted for autonomous agent behavior in software development.

Status: Live | Last Updated: 2026-01-13

Definition

The OODA Loop—Observe, Orient, Decide, Act—is a strategic decision-making cycle originally developed by U.S. Air Force Colonel John Boyd for aerial combat. Boyd’s insight: the combatant who cycles through these phases faster than their opponent gains decisive advantage. The key isn’t raw speed—it’s tempo relative to environmental change.

Boyd’s less-quoted but crucial insight: Orient is everything. The Orient phase is where mental models, context, and prior experience shape how observations become decisions. A faster but poorly-oriented loop loses to a slower but well-oriented one.

In agentic software development, OODA provides the cognitive model for how autonomous agents should behave: continuously cycling through observation, interpretation, planning, and execution.

The Four Phases

  1. Observe — Gather information about the current state of the environment
  2. Orient — Interpret observations through mental models, context, and constraints
  3. Decide — Formulate a specific plan for action based on orientation
  4. Act — Execute the plan, producing changes that feed new observations

The loop is continuous. Each Act produces new state, triggering new Observe, and the cycle repeats.

Key Characteristics

Tempo, Not Raw Speed

The strategic value of OODA isn’t speed—it’s cycling faster than the environment changes. In software development, the “environment” is the codebase, requirements, and constraints. An agent that can cycle through OODA before context rot sets in converges on correct solutions.

Orient as the Critical Phase

For AI agents, Orient is the context window. The quality of orientation depends on:

This is why Context Engineering isn’t optional overhead. It’s engineering the Orient phase, which determines whether fast cycling produces progress or noise.

OODA vs. Single-Shot Interactions

Standard LLM interactions are Observe-Act: user provides input, model produces output. No explicit Orient or Decide phase. The model’s “orientation” is implicit in training and whatever context happens to be present.

Agentic workflows make OODA explicit:

PhaseSingle-Shot LLMAgentic Workflow
ObserveUser promptInstrumented: read files, run tests, check logs
OrientImplicit (training + context)Engineered: Specs, Constitution, Context Gates
DecideImplicitExplicit: agent states plan before acting
ActGenerate responseVerified: external tools confirm success/failure

This explicit structure enables debugging. When an agent fails, you can diagnose which phase broke down:

ASDLC Usage

In ASDLC, OODA explains why cyclic workflows outperform linear pipelines:

OODA PhaseAgent BehaviorASDLC Component
ObserveRead codebase state, error logs, test resultsFile state, test output
OrientInterpret against context and constraintsContext Gates, AGENTS.md
DecideFormulate implementation planPBI decomposition
ActWrite code, run tests, commitMicro-commits

The Learning Loop is OODA with an explicit “Crystallize” step that improves future Orient phases. Where OODA cycles continuously, Learning Loop captures discoveries into machine-readable context for subsequent agent sessions.

Applied in:

Anti-Patterns

Anti-PatternDescriptionFailure Mode
Observe-ActSkipping Orient/Decide. Classic vibe coding.Works for simple tasks; fails at scale; no learning
Orient ParalysisOver-engineering context, never actingAnalysis paralysis; no forward progress
Stale OrientNot updating mental model when observations changeContext rot; agent operates on outdated assumptions
Observe BlindnessNot instrumenting observation of relevant stateAgent misses critical information (failed tests, error logs)
Act Without VerifyNot confirming action results before next cycleCascading errors; false confidence

Product Requirement Prompt (PRP)

A structured methodology combining PRD, codebase context, and agent runbook—the minimum spec for production-ready AI code.

Status: Experimental | Last Updated: 2025-01-05

Definition

A Product Requirement Prompt (PRP) is a structured methodology that answers the question: “What’s the minimum viable specification an AI coding agent needs to plausibly ship production-ready code in one pass?”

As creator Rasmus Widing defines it: “A PRP is PRD + curated codebase intelligence + agent runbook.”

Unlike traditional PRDs (which exclude implementation details) or simple prompts (which lack structure), PRPs occupy the middle ground—a complete context packet that gives an agent everything it needs to execute autonomously within bounded scope.

The methodology emerged from practical engineering work in 2024 and has since become the foundation for agentic engineering training.

Key Characteristics

PRPs are built on three core principles:

  1. Plan before you prompt — Structure thinking before invoking AI
  2. Context is everything — Comprehensive documentation enables quality output
  3. Scope to what the model can reliably do in one pass — Bounded execution units

A complete PRP includes six components:

ComponentPurpose
GoalWhat needs building
WhyBusiness value and impact justification
Success CriteriaStates that indicate completion (not activities)
Health MetricsNon-regression constraints (what must not degrade)
Strategic ContextTrade-offs & priorities (from Product Vision)
All Needed ContextDocumentation references, file paths, code snippets
Implementation BlueprintTask breakdown and pseudocode
Validation LoopMulti-level testing (syntax, unit, integration)

Key Differentiators from Traditional PRDs

ASDLC Usage

PRP components map directly to ASDLC concepts—a case of convergent evolution in agentic development practices.

PRP ComponentASDLC Equivalent
GoalThe Spec — Blueprint
WhyProduct Thinking
Success CriteriaContext Gates
Health MetricsThe Spec — Non-Functional Reqs / Constraints
Strategic ContextProduct Vision — Runtime Injection
All Needed ContextContext Engineering
Implementation BlueprintThe PBI
Validation LoopContext Gates — Quality Gates

In ASDLC terms, a PRP is equivalent to The Spec + The PBI + curated Context Engineering—bundled into a single artifact optimized for agent consumption.

ASDLC separates these concerns for reuse: multiple PBIs reference the same Spec, and context is curated per-task rather than duplicated. For simpler projects or rapid prototyping, the PRP’s unified format may be more practical. The methodologies are complementary—PRPs can be thought of as “collapsed ASDLC artifacts” for single-pass execution.

Applied in:

See also:

Product Thinking

The practice of engineers thinking about user outcomes, business context, and the 'why' before the 'how'—the core human skill in the AI era.

Status: Experimental | Last Updated: 2025-01-05

Definition

Product Thinking is the practice of engineers understanding and prioritizing user outcomes, business context, and the reasoning behind technical work (“why”) before focusing on implementation details (“how”).

Rather than waiting for fully-specified requirements and executing tasks mechanically, product-thinking engineers actively engage with the problem space. They ask:

This mindset originated in product management but has become essential for modern engineering teams, especially as AI increasingly handles implementation while humans must provide strategic judgment.

Key Characteristics

Outcome Orientation Product-thinking engineers measure success by user and business outcomes, not just task completion. They question whether closing a ticket actually moved the product forward.

Context Awareness They understand the broader system: user workflows, business constraints, competitive landscape, and technical debt landscape. Code decisions are made with this context, not in isolation.

Tradeoff Evaluation Every technical decision involves tradeoffs (speed vs maintainability, generality vs simplicity, build vs buy). Product-thinking engineers explicitly identify and evaluate these tradeoffs rather than defaulting to “best practice.”

Ownership Mindset They take responsibility for outcomes, not just implementations. If a feature ships but users don’t adopt it, a product-thinking engineer investigates why, even if the code “worked as specified.”

Risk Recognition They can look at technically correct code and identify product risks: “This will confuse users,” “This locks us into a vendor,” “This creates a support burden.” These risks are invisible to AI.

The AI Era Shift

Matt Watson (5x Founder/CTO, author of Product Driven) argues that vibe coders outperform average engineers not because of superior coding skill, but because they think about the product:

“A lot of engineers? They’re just waiting for requirements. That’s usually a leadership problem. For years, we rewarded engineers for staying in their lane, closing tickets, and not rocking the boat. Then we act surprised when they don’t think like owners.”

The traditional model:

  1. Product Manager writes requirements
  2. Engineer implements requirements
  3. Success = code matches spec

Why this fails in the AI era:

The new competitive advantage:

Watson’s conclusion: “Product thinking isn’t a bonus skill anymore. In an AI world, it’s the job.”

The Leadership Problem

Product thinking doesn’t emerge by accident. Watson identifies the structural cause:

Anti-patterns that kill product thinking:

What builds product thinking:

If every technical decision must flow through a product manager or architect, the organization has created a dependency on human bottlenecks that AI cannot solve.

Applications

Pre-AI Era: Product thinking was a differentiator for senior engineers and those in “full-stack” or startup environments. Most engineers could succeed by executing well-defined requirements.

AI Era: Product thinking becomes the baseline. As AI handles implementation, the human contribution shifts entirely to:

  1. Defining the problem worth solving
  2. Evaluating whether AI-generated solutions actually solve it
  3. Recognizing risks and tradeoffs the model cannot see

Where product thinking is essential:

ASDLC Usage

In ASDLC, product thinking is why Specs exist. The Spec is not bureaucratic overhead—it’s the forcing function that makes product thinking explicit and sharable.

The connection:

When an engineer writes a Spec, they’re forced to answer:

If they can’t answer these questions, they don’t understand the product problem yet. Vibe coding without this foundation produces code that works but solves the wrong problem.

The ASDLC position:

This is the “Instructor-in-the-Cockpit” model: the pilot (AI) flies the plane, but the instructor (human) decides where to fly and evaluates whether the flight is safe.

Applied in:

Best Practices

For Individual Engineers:

  1. Before writing code, write the “why” in plain English
  2. Question requirements that don’t explain user impact
  3. Propose alternatives when you see tradeoff mismatches
  4. Treat AI-generated code skeptically: Does it solve the right problem?

For Engineering Leaders:

  1. Share business context, even when it feels like “too much detail”
  2. Reward engineers who challenge bad requirements, not just those who ship fast
  3. Make “why” documentation non-optional (use Specs or equivalent)
  4. Measure outcomes (user adoption, retention, error rates) not just velocity (story points)

For Organizations:

  1. Flatten decision-making: trust engineers to own tradeoffs in their domain
  2. Train product thinking explicitly (it’s not intuitive for engineers trained to “just code”)
  3. Create feedback loops: engineers see how their code impacts users
  4. Recognize that AI scales implementation, not judgment—invest in the latter

Anti-Patterns

“Just Build It” Culture: Engineers discouraged from asking “why” or proposing alternatives. Leads to technically correct code that solves the wrong problem.

Context Hoarding: Product managers or architects hold all context and dole out tasks. Creates dependency bottleneck and prevents engineers from exercising judgment.

Velocity Worship: Success measured by tickets closed, not problems solved. Optimizes for speed of wrong solutions.

“Stay In Your Lane” Enforcement: Engineers punished for thinking beyond their assigned component. Prevents system-level thinking required for good product decisions.

See also:

Spec-Driven Development

Methodology that defines specifications before implementation, treating specs as living authorities that code must fulfill.

Status: Live | Last Updated: 2026-01-18

Definition

Spec-Driven Development (SDD) is an umbrella term for methodologies that define specifications before implementation. The core inversion: instead of code serving as the source of documentation, the spec becomes the authority that code must fulfill.

SDD emerged as a response to documentation decay in software projects. Traditional approaches treated specs as planning artifacts that diverged from reality post-implementation. Modern SDD treats specs as living documents co-located with code.

Contrast: For the anti-pattern SDD addresses, see Vibe Coding.

Key Characteristics

Living Documentation

Specs are not “fire and forget” planning artifacts. They reside in the repository alongside code and evolve with every change to the feature. This addresses the classic problem of documentation decay.

Iterative Refinement

Kent Beck critiques SDD implementations that assume “you aren’t going to learn anything during implementation.” This is a valid concern—specs must evolve during implementation, not block it. The spec captures learnings so future sessions can act on them.

Determinism Over Vibes

Nick Tune argues that orchestration logic should be “mechanical based on simple rules” (code) rather than probabilistic (LLMs). Specs define the rigid boundaries; code enforces the workflow; LLMs handle only the implementation tasks where flexibility is required.

Visual Designs Are Not Specs

[!WARNING] The Figma Trap A beautiful mockup is not a specification; it is a suggestion. Mockups typically demonstrate the “happy path” but hide the edge cases, error states, and data consistency rules where production bugs live.

Never treat a visual design as a complete technical requirement.

ASDLC Usage

ASDLC implements Spec-Driven Development through:

See also:

The 4D Framework (Anthropic)

A cognitive model codifying four essential competencies—Delegation, Description, Discernment, and Diligence—for effective generative AI use.

Status: Live | Last Updated: 2026-01-13

Definition

The 4D Framework is a cognitive model for human-AI collaboration developed by Anthropic in partnership with Dr. Joseph Feller and Rick Dakan as part of the AI Fluency curriculum.

The framework codifies four essential competencies for leveraging generative AI effectively and responsibly:

  1. Delegation — The Strategy
  2. Description — The Prompt
  3. Discernment — The Review
  4. Diligence — The Liability

Unlike process models (e.g., Agile or Double Diamond) that dictate workflow timing, the 4D Framework specifies how to interact with AI systems. It positions the human not merely as a “prompter,” but as an Editor-in-Chief, accountable for strategic direction and risk management.

The Four Dimensions

Delegation (The Strategy)

Before engaging with the tool, the human operator must determine what, if anything, should be assigned to the AI. This is a strategic decision between Automation (offloading repetitive tasks) and Augmentation (leveraging AI as a thought partner).

Core Question: “Is this task ‘boilerplate’ with well-defined rules (High Delegation), or does it demand nuanced judgment, deep context, or ethical considerations (Low Delegation)?”

Description (The Prompt)

AI output quality is directly proportional to input quality. “Description” transcends prompt engineering hacks by emphasizing Context Transfer—delivering explicit goals, constraints, and data structures required for the task.

Core Question: “Have I specified the constraints, interface definitions, and success criteria needed for this task?”

Discernment (The Review)

This marks the transition from Creator to Editor. The human must rigorously assess AI output for accuracy, hallucinations, bias, and overall quality. Failing to apply discernment is a leading cause of “AI Technical Debt.”

Core Question: “If I authored this output, would it meet code review standards? Does it introduce fictitious libraries or violate design tokens?”

Diligence (The Liability)

The human user retains full accountability for outcomes. Diligence acknowledges that while AI accelerates execution, it never removes user responsibility for security, copyright, or ethical compliance.

Core Question: “Am I exposing PII in the context window? Am I deploying unvetted code to production?”

Key Characteristics

The Editor-in-Chief Mental Model

The 4D Framework repositions the human from “prompt writer” to “editorial director.” Just as a newspaper editor doesn’t write every article but maintains accountability for what gets published, the AI-fluent professional maintains responsibility for all AI-generated outputs.

Continuous Cycle

These four dimensions are not sequential steps but concurrent concerns. Every AI interaction requires simultaneous attention to all four:

Anti-Patterns

Anti-PatternDescription
Over-DelegationAssigning strategic decisions or ethically sensitive tasks to AI
Vague DescriptionUsing natural language prompts without context, constraints, or examples
Blind AcceptanceCopy-pasting AI output without verification
Liability DenialAssuming AI-generated content is inherently trustworthy or legally defensible

ASDLC Usage

Applied in: AGENTS.md Specification, Context Engineering, Context Gates

The 4D dimensions map to ASDLC constructs: Delegation → agent autonomy levels, Description → context engineering, Discernment → context gates, Diligence → guardrail protocols.

The Learning Loop

The iterative cycle between exploratory implementation and spec refinement, balancing vibe coding velocity with captured learnings.

Status: Live | Last Updated: 2026-01-12

Definition

The Learning Loop is the iterative cycle between exploratory implementation and constraint crystallization. It acknowledges that understanding emerges through building, while ensuring that understanding is captured for future agent sessions.

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” He’s right—discovery is essential. But pure vibe coding loses those discoveries. The next agent session starts from zero, re-discovering (or missing) the same constraints.

The Learning Loop preserves discoveries as machine-readable context, enabling compounding understanding across sessions.

The Cycle

  1. Explore — Vibe code to discover edge cases, performance characteristics, or API behaviors
  2. Learn — Identify constraints that weren’t obvious from requirements
  3. Crystallize — Update the Spec with discovered constraints
  4. Verify — Gate future implementations against the updated Spec
  5. Repeat

Each iteration builds on the last. The spec grows smarter, and agents inherit the learnings of every previous session.

OODA Foundation

The Learning Loop is an application of the OODA Loop to software development:

Learning Loop PhaseOODA Equivalent
ExploreObserve + Act (gather information through building)
LearnOrient (interpret what was discovered)
CrystallizeDecide (commit learnings to persistent format)
VerifyObserve (confirm crystallized constraints via gates)

The key insight: in software development, Orient and Observe are interleaved. You often can’t observe relevant constraints until you’ve built something that reveals them. The Learning Loop makes this explicit by treating Explore as a legitimate phase rather than a deviation from the plan.

Key Characteristics

Not Waterfall

The Learning Loop explicitly rejects the waterfall assumption that all constraints can be known upfront. Specs are scaffolding that evolve, not stone tablets.

Not Pure Vibe Coding

The Learning Loop also rejects the vibe coding assumption that documentation is optional. Undocumented learnings are lost learnings—the next agent (or human) will repeat the same mistakes.

Machine-Readable Capture

Learnings must be captured in formats agents can consume: schemas, constraints in YAML, acceptance criteria in markdown. Natural language is acceptable but structured data is preferred.

“The real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping.” — Unmesh Joshi

Automation: The Ralph Loop

The Learning Loop describes an iterative cycle that typically involves human judgment at each phase. The Ralph Loop automates this cycle for tasks with machine-verifiable completion criteria:

Learning Loop PhaseRalph Loop Implementation
ExploreAgent implements based on PBI/Spec
LearnAgent reads error logs, test failures, build output
CrystallizeAgent updates progress.txt; commits to Git
VerifyExternal tools (Jest, tsc, Docker) confirm success

When verification fails, Ralph automatically re-enters Explore with the learned context. The loop continues until external verification passes or iteration limit is reached.

Key difference: The Learning Loop expects human judgment in the Learn and Crystallize phases. The Ralph Loop requires that “learning” be expressible as observable state (error logs, test results) and “crystallization” be automatic (Git commits, progress files).

Ralph Loops work best when success criteria are machine-verifiable (tests pass, builds complete). For tasks requiring human judgment—ambiguous requirements, architectural decisions, product direction—the Learning Loop remains the appropriate model.

ASDLC Usage

In ASDLC, the Learning Loop connects several core concepts:

Applied in:

Anti-Patterns

Anti-PatternDescription
Waterfall SpecsWriting exhaustive specs before any implementation, assuming no learning will occur
Ephemeral Vibe CodingGenerating code without ever crystallizing learnings into specs
Spec-as-PaperworkUpdating specs for compliance rather than genuine constraint capture
Post-Hoc DocumentationWriting specs after implementation is complete, losing the iterative benefit

Vibe Coding

Natural language code generation without formal specs—powerful for prototyping, problematic for production systems.

Status: Experimental | Last Updated: 2025-01-05

Definition

Vibe Coding is the practice of generating code directly from natural language prompts without formal specifications, schemas, or contracts. Coined by Andrej Karpathy, the term describes an AI-assisted development mode where engineers describe desired functionality conversationally (“make this faster,” “add a login button”), and the LLM produces implementation code.

This approach represents a fundamental shift: instead of writing specifications that constrain implementation, developers describe intent and trust the model to infer the details. The result is rapid iteration—code appears almost as fast as you can articulate what you want.

While vibe coding accelerates prototyping and exploration, it inverts traditional software engineering rigor: the specification emerges after the code, if at all.

The Seduction of Speed

The productivity gains from vibe coding are undeniable:

This velocity is seductive. When a feature that previously took three days can be scaffolded in thirty minutes, the economic pressure to adopt vibe coding becomes overwhelming.

The feedback loop is immediate: describe the behavior, see the code, run it, iterate. For throwaway scripts, MVPs, and rapid exploration, this workflow is transformative.

The Failure Modes

The velocity advantage of vibe coding collapses when code must be maintained, extended, or integrated into production systems:

Technical Debt Accumulation

Forrester Research predicts that by 2026, 75% of technology leaders will face moderate-to-severe technical debt directly attributable to AI-generated code. The mechanism is straightforward: code generated from vague prompts encodes vague assumptions.

When specifications exist only in the prompt history (or the engineer’s head), future maintainers inherit code without contracts. They must reverse-engineer intent from implementation—the exact problem formal specifications solve.

Copy-Paste Culture

2024 marked the first year in industry history where copy-pasted code exceeded refactored code. This is a direct symptom of vibe coding: when generating fresh code is faster than understanding existing code, engineers default to regeneration over refactoring.

The result is systemic duplication. The same logic appears in fifteen places with fifteen slightly different implementations, none validated against a shared contract.

Silent Drift

LLMs are probabilistic. When generating code from vibes, they make assumptions:

These assumptions are never documented. The code passes tests (if tests exist), but violates implicit architectural contracts. Over time, the system drifts toward inconsistency—different modules make different assumptions about the same concepts.

Boris Cherny (Principal Engineer, Anthropic; creator of Claude Code) warns: “You want maintainable code sometimes. You want to be very thoughtful about every line sometimes.”

“Speed is seductive. Maintainability is survival.”
— Boris Cherny, The Peterman Podcast (December 2025)

Vibe Coded Into a Corner

Anthropic’s internal research found that engineers who spend more time on Claude-assisted tasks often do so because they “vibe code themselves into a corner”—generating code without specs until debugging and cleanup overhead exceeds the initial velocity gains.

“When producing output is so easy and fast, it gets harder and harder to actually take the time to learn something.” — Anthropic engineer

This creates a debt spiral: vibe coding is fast until it isn’t, and by then the context needed to fix issues was never documented.

Regression to the Mean

Without deterministic constraints, LLMs trend toward generic solutions. Vibe coding produces code that works but lacks the specific optimizations, domain constraints, and architectural decisions that distinguish production systems from prototypes.

The model doesn’t know that “user IDs must never be logged” or “this cache must invalidate within 100ms.” These constraints exist in specifications, not prompts.

Applications

Vibe coding is particularly effective in specific contexts:

Rapid Prototyping: When validating product hypotheses, speed of iteration outweighs code quality. Vibe coding enables designers and product managers to generate functional prototypes without deep programming knowledge.

Throwaway Scripts: One-off data migrations, analysis scripts, and temporary tooling benefit from vibe coding’s velocity. Since the code has no maintenance burden, formal specifications are unnecessary overhead.

Learning and Exploration: When experimenting with new APIs, frameworks, or architectural patterns, vibe coding provides immediate feedback. The goal is understanding, not production-ready code.

Greenfield MVPs: Early-stage startups building minimum viable products often prioritize speed-to-market over maintainability. Vibe coding accelerates this phase, though technical debt must be managed during the transition to production.

ASDLC Usage

In ASDLC, vibe coding is recognized as a legitimate operational mode for bounded contexts (exploration, prototyping, throwaway code). However, for production systems, ASDLC mandates a transition to deterministic development.

The ASDLC position:

Applied in:

See also:

YAML

A human-readable data serialization language that serves as the structured specification format for configuration, schemas, and file structures in agentic workflows.

Status: Live | Last Updated: 2026-01-13

Definition

YAML (YAML Ain’t Markup Language) is a human-readable data serialization language designed for configuration files, data exchange, and structured documentation. In agentic development, YAML serves as the specification language for data structures, schemas, and file organization.

Where Gherkin specifies behavior (Given-When-Then), YAML specifies structure (keys, values, hierarchies). Both are human-readable formats that bridge the gap between human intent and machine execution.

Key Characteristics

Human-Readable Structure

YAML’s indentation-based syntax mirrors how humans naturally organize hierarchical information:

notification:
  channels:
    - websocket
    - email
    - sms
  constraints:
    latency_ms: 100
    retry_count: 3
  fallback:
    enabled: true
    order: [websocket, email, sms]

Schema-First Design

YAML enables schema-first development where data structures are defined before implementation:

# Schema definition in spec
user:
  id: string (UUID)
  email: string (email format)
  roles: array of enum [admin, user, guest]
  created_at: datetime (ISO 8601)

Agents can validate implementations against these schemas, catching type mismatches and missing fields before runtime.

Configuration as Code

YAML configurations live in version control alongside code, enabling:

ASDLC Usage

YAML serves as the data structure specification language in ASDLC, completing the specification triad:

In Specs: All ASDLC articles use YAML frontmatter for structured metadata. The Spec pattern leverages YAML for schema definitions that agents validate against.

In AGENTS.md: The AGENTS.md Specification uses YAML for structured directives—project context, constraints, and preferred patterns.

Applied in:

Patterns (A-Z)

Adversarial Code Review

Consensus verification pattern using a secondary Critic Agent to review Builder Agent output against the Spec.

Status: Experimental | Last Updated: 2026-01-09

Definition

Adversarial Code Review is a verification pattern where a distinct AI session—the Critic Agent—reviews code produced by the Builder Agent against the Spec before human review.

This extends the Critic (Hostile Agent) pattern from the design phase into the implementation phase, creating a verification checkpoint that breaks the “echo chamber” where a model validates its own output.

The Builder Agent (optimized for speed and syntax) generates code. The Critic Agent (optimized for reasoning and logic) attempts to reject it based on spec violations.

The Problem: Self-Validation Ineffectiveness

LLMs are probabilistic text generators trained to be helpful. When asked “Check your work,” a model that just generated code will often:

Hallucinate correctness — Confidently affirm that buggy logic is correct because it matches the plausible pattern in training data.

Double down on errors — Explain why the bug is actually a feature, reinforcing the original mistake.

Share context blindness — Miss gaps because it operates within the same context window and reasoning path that produced the original output.

If the same computational session writes and reviews code, the “review” provides minimal independent validation.

The Solution: Separated Roles

To create effective verification, separate the generation and critique roles:

The Builder — Optimizes for implementation throughput (e.g., Gemini 3 Flash, Claude Haiku 4.5). Generates code from the PBI and Spec.

The Critic — Optimizes for logical consistency and constraint satisfaction (e.g., Gemini 3 Deep Think, DeepSeek V3.2). Validates code against Spec contracts without rewriting.

The Critic does not generate alternative implementations. It acts as a gatekeeper, producing either PASS or a list of spec violations that must be addressed.

The Workflow

1. Build Phase

The Builder Agent implements the PBI according to the Spec.

Output: Code changes, implementation notes.

Example: “Updated auth.ts to support OAuth login flow.”

2. Context Swap (Fresh Eyes)

Critical: Start a new AI session or chat thread for critique. This clears conversation drift and forces the Critic to evaluate only the artifacts (Spec + Diff), not the Builder’s reasoning process.

If using the same model, close the current chat and open a fresh session. If using Model Routing, switch to a High Reasoning model.

3. Critique Phase

Feed the Spec and the code diff to the Critic Agent with adversarial framing:

System Prompt:

You are a rigorous Code Reviewer validating implementation against contracts.

Input:
- Spec: specs/auth-system.md
- Code Changes: src/auth.ts (diff)

Task:
Compare the code strictly against the Spec's Blueprint (constraints) and Contract (quality criteria).

Identify:
1. Spec violations (missing requirements, violated constraints)
2. Security issues (injection vulnerabilities, auth bypasses)
3. Edge cases not handled (error paths, race conditions)
4. Anti-patterns explicitly forbidden in the Spec

Output Format:
- PASS (if no violations)
- For each violation, provide:
  1. Violation Description (what contract was broken)
  2. Impact Analysis (why this matters: performance, security, maintainability)
  3. Remediation Path (ordered list of fixes, prefer standard patterns, escalate if needed)
  4. Test Requirements (what tests would prevent regression)

This transforms critique from "reject" to "here's how to fix it."

4. Verdict

If PASS: Code moves to human Acceptance Gate (L3 review for strategic fit).

If FAIL: Violations are fed back to Builder as a new task: “Address these spec violations before proceeding.”

This creates a Context Gate between code generation and human review.

Relationship to Context Gates

Adversarial Code Review implements a Review Gate as defined in Context Gates:

Quality Gates (deterministic) — Verify syntax, compilation, linting, test passage.

Review Gates (probabilistic, adversarial) — Verify semantic correctness, spec compliance, architectural consistency. This is where Adversarial Code Review operates.

Acceptance Gates (subjective, HITL) — Verify strategic fit and product vision alignment.

The Critic sits between automated tooling and human review, catching issues that compilers miss but that don’t require human strategic judgment.

Integration with Model Routing

Use Model Routing to assign models by capability profile:

RoleModel ProfileRationale
BuilderHigh ThroughputFast code generation with strong syntax knowledge
CriticHigh ReasoningDeep logic evaluation, constraint satisfaction, edge case discovery

This leverages the strengths of each model class: speed for generation, reasoning depth for validation.

Strategic Value

Reduces L3 Cognitive Load — Human reviewers focus on “Is this the right product?” rather than catching spec deviations or missing error handling.

Catches Regression to Mediocrity — Coding models gravitate toward average solutions. The Critic enforces novelty and architectural intent from the Spec.

Enforces Spec Quality — If the Critic can’t determine whether code is correct, the Spec is ambiguous. This surfaces specification gaps.

Prevents Silent Failures — The Critic catches implementation shortcuts (skipped validation, missing edge cases) that pass tests but violate contracts.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation validated this pattern’s effectiveness:

Context: A user story required filtering audit logs by date range. The Builder Agent implemented the requirement, tests passed, and the code compiled without errors.

Issue Detected: The Critic Agent identified a silent performance violation:

// Implementation passed all Quality Gates but violated architectural constraint
var logs = await repository.LoadAll(); // Loads entire table into memory
return logs.Where(log => log.Date > startDate); // Filters in-memory

Critic Output:

VIOLATION: Performance - Data Access Pattern

Spec requires database-level filtering for datasets exceeding 1k records.
Implementation loads full table then filters in-memory.

Impact: Works with small data; fails at scale (10k+ records cause memory issues)

Remediation Path:
1. Push filtering to database: repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this filter, use raw SQL
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in the repository interface

Key Learnings:

  1. Silent Performance Risks — Code that passes all tests can still violate architectural constraints. The Critic caught the LoadAll().Filter() anti-pattern before production.

  2. Iterative Refinement — The Critic initially flagged “missing E2E tests,” which were actually present but structured differently. The team updated the Critic’s instructions to recognize the project’s test architecture, demonstrating the pattern’s adaptability.

  3. Tone Calibration — Using “Approve with suggestions” framing prevented blocking valid code while surfacing genuine risks. The Critic didn’t reject the PR—it flagged optimization opportunities with clear remediation paths.

This validates the pattern’s core thesis: adversarial review catches architectural violations that pass deterministic checks but violate semantic contracts.

Example: The Silent Performance Bug

Spec Contract: “All database retries must use exponential backoff to prevent thundering herd during outages.”

Builder Output: Clean code with a simple retry loop using fixed 1-second delays. Tests pass.

// src/db.ts
async function queryWithRetry(sql: string) {
  for (let i = 0; i < 5; i++) {
    try {
      return await db.query(sql);
    } catch (err) {
      await sleep(1000); // Fixed delay
    }
  }
}

Critic Response:

VIOLATION: src/db.ts Line 45

Spec requires exponential backoff. Implementation uses constant sleep(1000).

Impact: During database outages, this will cause thundering herd problems
as all clients retry simultaneously.

Required: Implement delay = baseDelay * (2 ** attemptNumber)

Without the Critic, a human skimming the PR might miss the constant delay. The automated tests wouldn’t catch it (the code works). The Critic, reading against the contract, identifies the violation.

Implementation Constraints

Not Automated (Yet) — As of December 2025, this requires manual orchestration. Engineers must manually switch sessions/models and feed context to the Critic.

Context Window Limits — Large diffs may exceed even Massive Context models. Use Context Gates filtering to provide only changed files + relevant Spec sections.

Critic Needs Clear Contracts — The Critic can only enforce what’s documented in the Spec. Vague specs produce vague critiques.

Model Capability Variance — Not all “reasoning” models perform equally at code review. Validate your model’s performance on representative examples.

Relationship to Agent Constitution

The Agent Constitution defines behavioral directives for agents. For Adversarial Code Review:

Builder Constitution: “Implement the Spec’s contracts. Prioritize clarity and correctness over cleverness.”

Critic Constitution: “You are skeptical. Your job is to reject code that violates the Spec, even if it ‘works.’ Favor false positives over false negatives.”

This frames the Critic’s role as adversarial by design—it’s explicitly told to be rigorous and skeptical, counterbalancing the Builder’s helpfulness bias.

Future Automation Potential

This pattern is currently manual but has clear automation paths:

CI/CD Integration — Run Critic automatically on PR creation, posting violations as review comments.

IDE Integration — Real-time critique as code is written, similar to linting but spec-aware.

Multi-Agent Orchestration — Automated handoff between Builder and Critic until PASS is achieved.

Programmatic Orchestration (Workflow as Code)

To scale this pattern, move from manual prompt-pasting to code-based orchestration (e.g., using the Claude Code SDK).

Convention-Based Loading: Store reviewer agent prompts in a standard directory (e.g., .claude/agents/) and load them dynamically:

// Load the specific reviewer agent
const reviewerPrompt = await fs.readFile(`.claude/agents/${agentName}.md`);

// Spawn subagent via SDK
const reviewResult = await claude.query({
  prompt: reviewerPrompt,
  context: { spec, diff },
  outputFormat: { type: 'json_schema', schema: ReviewSchema }
});

This allows you to treat Critic Agents as standardized, version-controlled functions in your build pipeline.

As agent orchestration tooling matures, this pattern may move from Experimental to Standard.

See also:

Agent Constitution

Persistent, high-level directives that shape agent behavior and decision-making before action.

Status: Live | Last Updated: 2026-01-19

Definition

An Agent Constitution is a set of high-level principles or “Prime Directives” injected into an agent’s system prompt to align its intent and behavior with system goals.

The concept originates from Anthropic’s Constitutional AI research, which proposed training models to be “Helpful, Honest, and Harmless” (HHH) using a written constitution rather than human labels alone. In the ASDLC, we adapt this alignment technique to System Prompt Engineering—using the Constitution to define the “Superego” of our coding agents.

The Problem: Infinite Flexibility

Without a Constitution, an Agent is purely probabilistic. It will optimize for being “helpful” to the immediate prompt user, often sacrificing long-term system integrity.

If a prompt says “Implement this fast,” a helpful agent might skip tests. A Constitutional Agent would refuse: “I cannot skip tests because Principle #3 forbids merging unverified code.”

The Solution: Proactive Behavioral Alignment

The Constitution shapes agent behavior before action occurs—unlike reactive mechanisms (tests, gates) that catch problems after the fact.

The Driver Training Analogy

To understand the difference between a Constitution and other control mechanisms, consider the analogy of driving a car:

The “Orient” Phase

In the OODA Loop (Observe-Orient-Decide-Act), the Constitution lives squarely in the Orient phase.

When an agent Observes the world (reads code, sees a user request), the Constitution acts as a filter for how it interprets those observations.

Taxonomy: Steering vs. Hard Constraints

It is critical to distinguish what the Constitution can enforce (Steering) from what it must rely on external systems to enforce (Hard).

Steering Constraints (Soft)

These live in the System Prompt or AGENTS.md. They influence the model’s reasoning, tone, and risk preference.

Hard Constraints (Orchestration)

These live in the Runtime Environment (Hooks, API limits, Docker containers). They physically prevent the agent from taking restricted actions.

The Agent Constitution is primarily about Steering Constraints that govern behavior, while Context Gates and Workflow as Code implement the Hard Constraints.

Anatomy of a Constitution

Research into effective system prompts suggests a constitution should have four distinct components:

1. Identity (The Persona)

Who is the agent? This prunes the search space of the model (e.g., “You are a Senior Rust Engineer” vs “You are a poetic assistant”).

2. The Mission (Objectives)

What is the agent trying to achieve?

3. The Boundaries (Negative Constraints)

What must the agent never do? These are “Soft Gates”—instructions to avoid bad paths before hitting the hard Context Gates.

4. The Process (Step-by-Step)

How should the agent think? This enforces Chain-of-Thought reasoning.

Constitution vs. Spec

A common failure mode is mixing functional requirements with behavioral guidelines. Separation is critical:

FeatureAgent ConstitutionThe Spec
ScopeGlobal / Persona-wideLocal / Task-specific
LifespanPersistent (Project Lifecycle)Ephemeral (Feature Lifecycle)
ContentValues, Style, Ethics, SafetyLogic, Data Structures, Routes
Example”Prioritize Type Safety over Brevity.""User id must be a UUID.”

Self-Correction Loop

One of the most powerful applications of a Constitution is the Critique-and-Refine loop (derived from Anthropic’s Supervised Learning phase):

  1. Draft: Agent generates a response to the user’s task.
  2. Critique: Agent (or a separate Critic agent) compares the draft against the Constitution.
  3. Refine: Agent rewrites the draft to address the critique.

This allows the agent to fix violations (e.g., “I used any type, but the Constitution forbids it”) before the user ever sees the code.

Persona-Specific Constitutions

Defining different Constitutions for different roles enables Adversarial Code Review.

1. The Builder (Optimist)

“Your goal is to be helpful and productive. Write code that solves the user’s problem. If the spec is slightly vague, make a reasonable guess to keep momentum going. Prioritize clean, readable implementation.”

2. The Critic (Pessimist)

“Your goal is to be a skeptical gatekeeper. Assume the code is broken or insecure until proven otherwise. Do not be helpful; be accurate. If the spec is vague, reject the code and demand clarification. Prioritize correctness and edge-case handling.”

By running the same prompt through these two different Constitutions, you generate a dialectic process that uncovers issues a single “neutral” agent would miss.

Implementation

1. Documentation

The industry standard for documenting your Agent Constitution is AGENTS.md. This file lives in your repository root and serves as the source of truth for your agents.

2. Injection

Inject the Constitution into the System Prompt of your LLM interaction.

3. Tuning

Constitutions must be tuned. If they are too strict, the agent becomes paralyzed (refusing to code because “it might be insecure”). If too loose, the agent halts for every minor ambiguity.

The “Be Good” Trap: Avoid vague directives like “Write good code.”

Relationship to Other Patterns

Constitutional Review — The pattern for using a Critic agent to review code specifically against the Agent Constitution.

Context Gates — The deterministic checks that back up the probabilistic Constitution. Hard Constraints implemented via orchestration.

Adversarial Code Review — Uses persona-specific Constitutions (Builder vs Critic) to create dialectic review processes.

The Spec — Defines task-specific requirements, while the Constitution defines global behavioral guidelines.

AGENTS.md Specification — The practice for documenting and maintaining your Agent Constitution.

Workflow as Code — Implements Hard Constraints programmatically, complementing the Constitution’s Steering Constraints.

See also:

Agentic Double Diamond

Transforming the classic design thinking framework into a computational pipeline.

Status: Draft | Last Updated: 2025-12-13

Agentic Double Diamond

Summary

The Agentic Double Diamond transforms the classic design thinking framework from a workshop-based activity into a computational pipeline. Instead of producing static artifacts (PDFs, slide decks, sticky notes) for human interpretation, this pattern uses agents to ingest raw data and output structured, machine-readable “Context Feeds” (Vectors, JSON, Gherkin) that drive downstream Coder and QA agents.

The Context

In a traditional SDLC, the “Double Diamond” (Discover, Define, Develop, Deliver) is often a bottleneck of unstructured data.

  1. Lossy Handoffs: Insights from the Discover phase are summarized into powerpoints, losing the raw fidelity needed for edge-case testing.

  2. Static Deliverables: Deliver produces Figma files or flat specs. An AI Coding Agent cannot “look” at a Figma file and understand the intent behind a hover state or a complex validation rule without explicit text description.

  3. The “Gap of Silence”: Once design is handed off, the “User Voice” is silent until UAT.

In the Agentic SDLC, we treat Design not as drawing screens, but as Context Engineering. The goal is to build the “Truth” that the build agents will execute.

The Pattern

The Cybernetic Double Diamond reimagines the two diamonds as Context Furnaces:

Phase 1: Discover (The Sensor Network)

Traditional: User interviews, market research, sticky notes on a wall. Agentic: Massive automated ingestion and pattern matching.

The Workflow: Instead of a manual research sprint, we deploy a Sensor Network of Harvester Agents.

Output Artifact: research_vectors.json

A vector store containing weighted pain points, frequency analysis, and raw user quotes linked by semantic relevance.

Phase 2: Define (The Simulator)

Traditional: Static Personas (PDFs), Journey Maps. Agentic: Active User Simulators and Living Requirements.

The Workflow: We use the data from Phase 1 to fine-tune Synthetic User Agents (see Agent Personas).

Output Artifact: persona_definition.yaml & problem_graph.json

A serialized definition of the user that QA agents can later use to “test” the software, and a knowledge graph linking business goals to user pain points.

Phase 3: Develop (The Generative Studio)

Traditional: Manual Wireframing, Prototyping. Agentic: Multi-modal generation and adversarial simulation.

The Workflow:

Output Artifact: design_tokens.json & behavioral_prototype.js

Design-as-Code. Figma designs are instantly converted to JSON tokens and React component scaffolds.

Phase 4: Deliver (The Context Feed)

Traditional: Handoff meetings, Jira tickets. Agentic: Compilation of the “Blueprints” for the Agentic SDLC.

The Workflow: This phase is purely about Packaging. The goal is to create a “Feature Manifest” that the Coder Agents can consume without hallucination.

Output Artifact: feature_manifest.zip

A package containing the “Truth” for the build agents:

  1. requirements.md (The narrative)

  2. acceptance_criteria.feature (The test logic)

  3. mockup_context.json (The visual specs)

Artifact Example: The Feature Manifest

When the “Deliver” phase is complete, the Design Agent commits a manifest to the repository. This triggers the Coder Agent.

manifests/feature-one-click-checkout/requirements.md

# Feature: One-Click Checkout
## Insight Source
- Linked to Insight ID: #INS-882 (Users abandon cart due to form fatigue)
- Priority Score: 9.2 (Calculated by Impact/Effort Agent)

## Synthetic User Validation
- Persona "Sarah" Acceptance Rate: 95%
- Persona "Mike" Acceptance Rate: 88% (Concern: "Where is the receipt?")

manifests/feature-one-click-checkout/acceptance_criteria.feature

Feature: One Click Checkout
  Scenario: User has stored payment
    GIVEN user_id IS "valid"
    AND payment_method IS "stored"
    WHEN button "Buy Now" is_clicked
    THEN system MUST process_transaction WITHIN 2000ms
    AND system MUST NOT show "Confirmation Modal"

Benefits

  1. Zero Translation Loss: The “Spec” is code before the code is written.

  2. Adversarial Resilience: Designs are “tested” by Synthetic Users before development begins.

  3. Living Context: The logic is traceable back to the raw research vector (e.g., “Why is this button red?” -> “Because 400 support tickets complained about visibility”).

Constitutional Review

Verification pattern that validates implementation against both functional requirements (Spec) and architectural values (Constitution).

Status: Experimental | Last Updated: 2026-01-09

Definition

Constitutional Review is a verification pattern that validates code against two distinct contracts:

  1. The Spec (functional requirements) — Does it do what was asked?
  2. The Constitution (architectural values) — Does it do it the right way?

This pattern extends Adversarial Code Review by adding a second validation layer. Code can pass all tests and satisfy the Spec’s functional requirements while still violating the project’s architectural principles documented in the Agent Constitution.

The Problem: Technically Correct But Architecturally Wrong

Standard verification catches functional bugs:

But code can pass all these checks and still violate architectural constraints:

Example: The Performance Violation

// Spec requirement: "Filter audit logs by date range"
async function getAuditLogs(startDate: Date) {
  const logs = await db.auditLogs.findAll(); // ❌ Loads entire table
  return logs.filter(log => log.date > startDate); // ❌ Filters in memory
}

Quality Gates: ✅ Tests pass (small dataset)
Spec Compliance: ✅ Returns filtered logs
Constitutional Review: ❌ Violates “push filtering to database layer”

The code is functionally correct but architecturally unsound. It works fine with 100 records but fails catastrophically at 10,000+.

The Solution: Dual-Contract Validation

Constitutional Review solves this by validating against two sources of truth:

Traditional Review (Functional)

Constitutional Review (Architectural)

The Critic Agent validates against BOTH contracts:

  1. Functional correctness (from the Spec)
  2. Architectural consistency (from the Constitution)

Anatomy

Constitutional Review consists of three key components:

The Dual-Contract Input

Spec Contract — Defines functional requirements, API contracts, and data schemas. Answers “what should it do?”

Constitution Contract — Defines architectural patterns, performance constraints, and security rules. Answers “how should it work?”

Both contracts are fed to the Critic Agent for validation.

The Critic Agent

A secondary AI session (ideally using a reasoning-optimized model) that:

This extends the Adversarial Code Review Critic with constitutional awareness.

The Violation Report

When constitutional violations are detected, the Critic produces:

  1. Violation Description — What constitutional principle was violated
  2. Impact Analysis — Why this matters at scale (performance, security, maintainability)
  3. Remediation Path — Ordered steps to fix (prefer standard patterns, escalate if needed)
  4. Test Requirements — What tests would prevent regression

This transforms review from rejection to guidance.

Relationship to Other Patterns

Adversarial Code Review — The base pattern that Constitutional Review extends. Adds the Constitution as a second validation contract.

Agent Constitution — The source of architectural truth. Defines the “driver training” that shapes initial behavior; Constitutional Review verifies the training was followed.

The Spec — The source of functional truth. Constitutional Review validates against both Spec and Constitution.

Context Gates — Constitutional Review implements a specialized Review Gate that validates architectural consistency.

Feedback Loop: Constitution shapes behavior → Constitutional Review catches violations → Violations inform Constitution updates (if principles aren’t clear enough).

Integration with Context Gates

Constitutional Review implements a specialized Review Gate that sits between Quality Gates and Acceptance Gates:

Gate TypeQuestionValidated By
Quality GatesDoes it compile and pass tests?Toolchain (deterministic)
Spec Review GateDoes it implement requirements?Critic Agent (probabilistic)
Constitutional Review GateDoes it follow principles?Critic Agent (probabilistic)
Acceptance GateIs it the right solution?Human (subjective)

The Constitutional Review Gate catches architectural violations that pass functional verification.

Strategic Value

Catches “Regression to Mediocrity” — LLMs are trained on average code from the internet. Without constitutional constraints, they gravitate toward common but suboptimal patterns.

Enforces Institutional Knowledge — Architectural decisions (performance patterns, security rules, error handling strategies) are documented once in the Constitution and verified on every implementation.

Surfaces Specification Gaps — If the Critic can’t determine whether code violates constitutional principles, the Constitution needs clarification. This improves the entire system.

Reduces L3 Review Burden — Human reviewers focus on strategic fit (“Is this the right feature?”) rather than catching architectural violations (“Why are you loading the entire table?”).

Prevents Silent Failures — Code that “works” but violates architectural principles (like the LoadAll().Filter() anti-pattern) is caught before production.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation caught a constitutional violation that passed all other gates:

Context: User story required filtering audit logs by date range. Builder Agent implemented the requirement, tests passed, code compiled without errors.

Code Behavior:

Gate Results:

Critic Output: Provided specific remediation path:

  1. Push filter to database query layer
  2. If ORM doesn’t support pattern, use raw SQL
  3. Add performance test with 10k+ records
  4. Document constraint in repository interface

Impact: Silent performance bug caught before production. The code worked perfectly in development (small dataset) but would have failed catastrophically at scale.

See full case study in Adversarial Code Review.

Implementing Practice

For step-by-step implementation guidance, see:

See also:

Context Gates

Architectural checkpoints that filter input context and validate output artifacts between phases of work to prevent cognitive overload and ensure system integrity.

Status: Experimental | Last Updated: 2026-01-18

Definition

Context Gates are architectural checkpoints that sit between phases of agentic work. They serve a dual mandate: filtering the input context to prevent cognitive overload, and validating the output artifacts to ensure system integrity.

Unlike “Guardrails,” which conflate prompt engineering with hard constraints, Context Gates are distinct, structural barriers that enforce contracts between agent sessions and phases.

The Problem: Context Pollution and Unvalidated Outputs

Without architectural checkpoints, agentic systems suffer from two critical failures:

Context Pollution — Agents accumulate massive conversation histories (observations, tool outputs, internal monologues, errors). When transitioning between sessions or tasks, feeding the entire context creates cognitive overload. Signal-to-noise ratio drops, and agents lose focus on the current objective—Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side.”

Unvalidated Outputs — Code that passes automated tests can still violate semantic contracts (spec requirements, architectural constraints, security policies). Without probabilistic validation layers, implementation shortcuts and silent failures slip through to production.

Why Existing Approaches Fail:

The Solution: Dual-Mandate Checkpoint Architecture

Context Gates solve this by creating two distinct checkpoint types:

Input Gates — Filter and compress context entering an agent session, ensuring only relevant information is presented. This prevents cognitive overload and maintains task focus.

Output Gates — Validate artifacts leaving an agent session through three tiers of verification: deterministic checks, probabilistic review, and human acceptance.

The key insight: Context must be controlled at the boundaries, not throughout execution. Agents work freely within their session, but transitions enforce strict contracts.

Anatomy

Context Gates consist of two primary structures, each with distinct sub-components:

Input Gates

Input Gates control what context enters an agent session.

Summary Gates (Cross-Session Transfer)

When transitioning work between agent sessions, Summary Gates compress conversation history into essential state.

Examples:

Context Filtering (Within-Session)

During multi-step tasks within a single session, Context Filtering determines what historical information is relevant to the current sub-task.

Output Gates

Output Gates validate artifacts before they progress to the next phase. Three tiers enforce different types of correctness:

Quality Gates (Deterministic)

Binary, automated checks enforced by the toolchain.

Examples:

Review Gates (Probabilistic, Adversarial)

LLM-assisted validation of semantic correctness and contract compliance.

Examples:

Output Format: When violations are detected, Review Gates provide actionable feedback:

  1. Violation Description — What contract was broken
  2. Impact Analysis — Why this matters (performance, security, maintainability)
  3. Remediation Path — Ordered list of fixes (prefer standard patterns, escalate if needed)
  4. Test Requirements — What tests would prevent regression

This transforms Review Gates from “reject” mechanisms into “guide to resolution” checkpoints.

Acceptance Gates (Human-in-the-Loop)

Subjective checks requiring human strategic judgment.

Examples:

Workflow Enforcement (Denial Gates)

Mechanisms that actively block agents from bypassing the defined process.

Examples:

Gate Taxonomy

FeatureSummary Gates (Input)Context Filtering (Input)Quality Gates (Output)Review Gates (Output)Acceptance Gates (Output)
FunctionSession handoffWithin-session filteringCode validitySpec complianceStrategic fit
GoalClean session transferMaintain focusPrevent broken codeEnforce contractsPrevent bad product
MechanismLLM SummarizationSemantic SearchCompilers / TestsLLM CritiqueHuman Review
NatureCompressionFilteringDeterministicProbabilisticSubjective
OutcomeCondensed contextClean context windowValid compilationSpec complianceApproved release

Relationship to Other Patterns

Adversarial Code Review — Implements the Review Gate tier of Output Gates. Uses a Critic Agent to validate code against the Spec’s contracts.

Constitutional Review — Extends Review Gates by validating against both the Spec (functional) and the Agent Constitution (architectural values).

Model Routing — Works with Context Gates to assign appropriate model capabilities to different gate types (throughput models for generation, reasoning models for Review Gates).

The Spec — Provides the contract that Review Gates validate against.

Agent Constitution — Provides architectural constraints that Constitutional Review validates against.

Ralph Loop — Applies Context Gates at iteration boundaries, using context rotation and progress files to prevent cognitive overload across autonomous loops.

Feature Assembly — The practice that uses all three Output Gates (Quality, Review, Acceptance) in the verification pipeline.

Workflow as Code — The practice for implementing gate enforcement programmatically rather than via prompt instructions.

Strategic Value

Prevents Context Overload — Agents receive only relevant information, maintaining task focus and reducing token usage.

Catches Semantic Violations — Review Gates detect contract violations that pass deterministic checks (performance anti-patterns, security gaps, missing edge cases).

Reduces Human Review Burden — Quality and Review Gates filter out obvious errors, letting humans focus on strategic fit rather than technical correctness.

Enforces Architectural Consistency — Constitutional Review (via Review Gates) ensures code follows project principles, not just internet-average patterns.

Creates Clear Contracts — Each gate type has explicit pass/fail criteria, making verification deterministic where possible and explicit where probabilistic.

See also:

Experience Modeling

A foundational phase of the Agentic Software Development Life Cycle (ASDLC) focused on creating the Experience Model—an organized design system that agents must follow.

Status: Proposed | Last Updated: 2025-12-02

Definition

Experience Modeling is a foundational phase of the Agentic Software Development Life Cycle (ASDLC). During this phase, we do not focus on building features; instead, we create the Experience Model—an organized design system that agents must follow. Just as we model data schemas for the backend, we also need to model the Experience Schema for the frontend. The Design System serves as the queryable model that the LLM uses to orchestrate the user interface.

Context Gates

An explicit context gate is implemented between the Experience Modeling and Feature Assembly phases. This methodology will significantly reduce Design Drift, which is the gradual divergence of a product’s actual codebase from its intended design specifications caused by the accumulation of micro-inconsistencies generated by AI.

%% caption: Context Gating for Design System Integrity
flowchart LR
  A[[...]] --> |CONTEXT| C
  C[EXPERIENCE MODELING] --> D
  D{GATE} --> E
  E[FEATURE ASSEMBLY]
    E --> |DEFECT/REQUIREMENT SIGNAL| C
    E --> |RELEASE| G
  G[[...]]
Context Gating for Design System Integrity
Context Gating for Design System Integrity

Quality

The quality gate is considered satisfied only when the Design System successfully compiles into a standalone, testable artifact. This artifact can vary from a complete enterprise Storybook to a single .astro or .html reference sheet, as long as it is generated through a custom build process rather than being maintained manually.

An Experience Modeling –to– Feature Assembly quality gate might verify the following:

  1. Token Strictness: The build pipeline fails if any “raw” values (such as hex codes or magic numbers) are detected by the linter, thereby enforcing the semantic token architecture.
  2. Schema Parity: The automated documentation (llms.txt) must strictly match the exported component type signatures.
  3. Build Success: The visual artifact must build in isolation. If the reference sheet or catalog cannot be generated, the Experience Model is deemed broken and unfit for agent consumption.

Acceptance

Type: Human-in-the-Loop

Verification is conducted on a live, interactive artifact, ensuring that components are not just static images but functional units. The ‘System Architect’ or ‘Design Technologist’ validates the Behavioral Contract of the system.

The reviewer confirms that experience elements function as expected: buttons manage interaction states (hover, focus, disabled), inputs correctly handle data entry, and layout containers adapt to spatial constraints.

Recommendations

To ensure consistent results, the Experience Model should be set to Read-Only during the Feature Assembly phase. Feature Agents utilize the design system without making any modifications. While building a feature, it is strictly prohibited for an agent to alter the core component definitions to accommodate a specific use case.

We recommend implementing the “Read-Only” state using one of two patterns, depending on the size of your project.

Pattern A: Hard Isolation

Context: Large Teams, Enterprise, or Production Systems.

In this approach, the Design System is treated like a third-party library, similar to React or Tailwind. It resides in a separate repository and is built and published to a package registry (such as NPM or NuGet).

Why This Works: The Feature Agent cannot alter the component source code because that code is not present in the project it is working on. It only interacts with the compiled exports.

EcosystemArtifactRegistryFeature Agent PoV
TypeScript (Web)NPM Package (Compiled)NPM / GitHub Packagesimport { Button } from '@org/design-system';
Python (Data/AI)Wheel (.whl) (Compiled Lib)Private PyPI / Artifactory`from org_core.schemas import UserIntent
Unity (Games)UPM PackageUnity Scoped Registryusing Org.Mechanics.Input;

Pattern B: Toolchain Enforcement

Context: Monorepos, Rapid Iteration, or Single-Team Projects

In this approach, the Design System coexists within the same repository as the application code but is safeguarded by mechanical write barriers. We do not rely on the Agent’s willingness to follow instructions (for example, the prompts found in the `agents.md). Instead, we use the version control system and build pipeline to automatically reject unauthorized modifications.

Why This Works: The method shifts the enforcement from Prompt Space (probabilistic) to Commit Space (deterministic). If a Feature Agent attempts to modify the design system files while building a feature, the pre-commit hooks or CI pipeline will trigger a hard failure, preventing the code from entering the history.

A Layered Defense?

While mechanical barriers provide a hard guarantee, we enhance efficiency by combining them with Agent Constitution (e.g., .cursorrules, agents.md).

The Gate Mechanism (Toolchain): This prevents corruption in case the Agent makes a mistake.

The Agent Constitution (Context): This helps prevent the Agent from making mistakes in the first place.

This combination reduces waste cycles generating code that will ultimately be rejected by the compiler or linter. However, it is important to note that the context rules are never the sole line of defense.

EcosystemEnforcement MechanismImplementation Pattern
TypeScript (Web)Husky / Lint-StagedA pre-commit hook scans staged files. If src/design-system/** is modified
Python (Data/AI)Pre-Commit FrameworkA local hook (.pre-commit-config.yaml) validates that src/core/ remains read-only for standard feature branches.
Unity (Games)Asset Post-ProcessorsAn OnPreprocessAsset script in the Editor instantly reverts changes to the /_Core folder if the Editor is not in “Architect Mode”.

External Attention

External Attention offloads document processing to isolated sub-agents, returning only extracted answers to the main agent's context window.

Status: Draft | Last Updated: 2026-01-10

Definition

External Attention is an architectural pattern for offloading document processing to isolated sub-agents rather than injecting documents into the main agent’s context window. The sub-agent queries the document and returns only the extracted answer.

This pattern addresses a fundamental tension in agentic systems: agents often need information from large documents (PDFs, codebases, research papers), but loading those documents directly into context degrades performance on the primary task.

The Problem: Context Bloat from Large Documents

When agents need information from large documents, the naive approach loads the document into context. This creates:

The Solution: Query, Don’t Load

Instead of:

Context = [Task Instructions] + [Full Document] + [Recent Actions]

Use:

Context = [Task Instructions] + [Query Result] + [Recent Actions]

Where Query Result comes from a specialized sub-agent that:

  1. Receives the document + specific question
  2. Extracts only the relevant answer
  3. Returns a bounded response to the main agent

The key insight: isolation preserves focus. The main agent’s context remains clean while the sub-agent handles the messy work of document comprehension.

Anatomy

External Attention consists of four components:

Document Ingestion Tool

A tool interface that accepts a document reference and a query. The main agent sees only the tool signature, not the document contents.

answer = answer_from_pdf(
    document="research-paper.pdf",
    query="What is the reported accuracy on benchmark X?"
)

Sub-Agent Context

An isolated context window where the full document is loaded alongside the query. This context is invisible to the main agent—it exists only for the duration of the tool call.

Query Processor

The sub-agent logic that:

Bounded Response Contract

The interface guaranteeing that only the extracted answer (not the full document) returns to the main agent. This is the critical boundary that prevents context pollution.

Relationship to Other Patterns

Model Routing — External Attention is a form of model routing where document processing routes to a specialized “reader” agent.

Context Gates — The tool boundary acts as a Context Gate, filtering document contents to only relevant extractions.

Levels of Autonomy — The document-processing sub-agent is an L1 Atomic Agent with a single responsibility.

Practice: External Document Processing — Implementation guidance TBD.

When to Use

When Not to Use

Industry Validation

The InfiAgent framework (Yu et al., 2026) demonstrates this pattern at scale: an 80-paper literature review task where the main agent maintains bounded context by delegating all document reading to answer_from_pdf tools. The approach enabled 80/80 paper coverage where baseline agents failed.

Model Routing

Strategic assignment of LLM models to SDLC phases based on reasoning capability versus execution speed.

Status: Experimental | Last Updated: 2026-01-13

Definition

Model Routing is the strategic assignment of different Large Language Models (LLMs) to different phases of the software development lifecycle based on their capability profile.

Different computational tasks have different performance characteristics. Model Routing matches model capabilities to task requirements: reasoning depth during design phases and speed with large context windows during implementation phases.

This is a tool selection strategy, not a delegation strategy. Engineers remain accountable for output quality while selecting the appropriate computational tool for each phase.

The Problem: Single-Model Inefficiency

Using one model for all phases creates a mismatch between computational capability and task requirements.

High-speed models struggle with architectural decisions requiring deep constraint satisfaction. Reasoning models are too slow for high-volume implementation tasks. Models with massive context windows are expensive when you only need to process small, focused changes.

Each model class optimizes for different performance characteristics. Using the wrong one wastes either quality (insufficient reasoning) or resources (excessive capability for simple tasks).

The Solution: Capability-Based Assignment

We categorize models into three capability profiles aligned with Agentic SDLC phases:

Capability ProfileOptimizationPrimary Use CasesModel Examples
High ReasoningDeep logic, high latency, “System 2” thinkingWriting Specs, architectural decisions, logic debugging, security analysisGemini 3 Deep Think, DeepSeek V3.2, OpenAI o3-pro
High ThroughputSpeed, low latency, real-time executionCode generation, refactoring, unit tests, UI implementationGemini 3 Flash, Llama 4 Scout, Claude Haiku 4.5
Massive ContextRepository-scale context (500k-5M tokens)Documentation analysis, codebase navigation, legacy system understandingGemini 3 Pro (5M tokens), Claude 4.5 Sonnet (500k), GPT-5 (RAG-native)

Model examples current as of December 27, 2025. The LLM landscape evolves rapidly—validate capabilities and availability before implementation.

Relationship to Levels of Autonomy

Levels of Autonomy define human oversight requirements. Model Routing complements this by matching computational capability to task characteristics:

This ensures that the computational tool’s capability profile matches the task’s computational requirements and the degree of human verification needed.

See also:

Product Vision

A structured vision document that transmits product taste and point-of-view to agents, preventing convergence toward generic outputs.

Status: Live | Last Updated: 2026-01-13

Definition

A Product Vision is a structured artifact that captures the taste, personality, and point-of-view that makes a product this product rather than generic software. It transmits product intuition to agents who otherwise default to bland, safe, interchangeable outputs.

Traditional vision documents are written for humans—investors, executives, new hires. In ASDLC, the Product Vision is structured for agent consumption, providing the context needed to make opinionated decisions aligned with product identity.

The Problem: Vibe Convergence

Agents trained on the entire internet converge toward the mean. Ask for a landing page, you get the same hero section everyone else gets. Ask for onboarding, you get the same three-step wizard. Ask for error copy, you get “Oops! Something went wrong.”

This isn’t a bug in the model. It’s the model doing exactly what it’s trained to do: produce the statistically average response. The average is safe. The average is forgettable.

The symptoms:

The Agent Constitution tells agents how to behave. The Spec tells agents what to build. Neither tells agents who we are.

The Solution: Structured Taste Transmission

The Product Vision bridges this gap by making product identity explicit and agent-consumable. Rather than hoping agents infer taste from scattered references, the vision provides a structured context packet that shapes output quality.

The key insight: agents don’t need complete documentation—they need curated opinions. A Product Vision isn’t comprehensive; it’s opinionated. It tells agents which tradeoffs to make when specs are ambiguous.

Anatomy

A Product Vision consists of five components, each serving a distinct purpose in shaping agent output.

1. The Actual Humans

Not “users” or “customers”—real people with context, constraints, and taste of their own. This gives agents a person to design for, not an abstraction.

When choosing between “simple onboarding wizard” and “power-user defaults with optional setup,” agents need basis for judgment. Abstract personas don’t provide this; descriptions of actual humans do.

2. Point of View

Opinions. Actual stances on tradeoffs that reasonable people might disagree with.

These aren’t requirements—they’re taste. They tell agents which direction to lean when specs are ambiguous:

3. Taste References

Concrete examples of products that feel right, and products that don’t. Agents can reference these patterns directly: “Make this feel more like Linear’s approach to lists, less like Jira’s.”

References provide calibration. Instead of describing “clean” in abstract terms, point to products that embody it—and products that don’t.

4. Voice and Language

How the product speaks. Not brand guidelines—actual examples of tone.

This includes:

5. Decision Heuristics

When agents face ambiguous choices, what should they optimize for? These are tie-breakers—the rules that resolve conflicts between equally valid approaches.

Placement in Context Hierarchy

Product Vision sits between the Constitution and the Specs:

TierArtifactPurpose
ConstitutionAGENTS.mdHow agents behave (rules, constraints)
VisionVISION.md or inlineWho the product is (taste, voice, POV)
Specs/plans/*.mdWhat to build (contracts, criteria)
Reference/docs/Full documentation, API specs, guides

The Constitution shapes behavior. The Vision shapes judgment. The Specs shape output.

Not every project needs a separate VISION.md. For smaller products or early-stage teams, the vision can live as a preamble in AGENTS.md. For complex products with detailed voice guidelines and taste references, a separate file prevents crowding out operational context.

See Product Vision Authoring for guidance on the inline vs. separate decision, templates, and maintenance practices.

Validated in Practice

Industry Validation

Marty Cagan (Silicon Valley Product Group) In the AI era, Cagan argues that product vision is more critical than ever. As AI lowers the cost of building features, differentiation shifts from “ability to ship” to “ability to solve value risks.” Without a strong vision, AI teams build “features that work” rather than “products that matter.”

“It will be easier to build features, but harder to build the right features.” — Marty Cagan

Lenny Rachitsky (Product Sense) Rachitsky defines “product sense” as the ability to consistently craft products with intended impact. VISION.md is essentially codified product sense—explicitly documenting the intuition that senior PMs use to steer teams, so that agents (who lack intuition) can simulate it.

The Scientific Basis: Countering Regression to the Mean

LLMs are probabilistic engines trained to predict the most likely next token. By definition, “most likely” means “most average.”

Without external constraint, an agent will always drift toward the Regression to the Mean. A Product Vision acts as a forcing function, artificially skewing the probability distribution toward specific, non-average choices (e.g., “playful” over “professional,” “dense” over “simple”).

Anti-Patterns

The Generic Vision

“User-centric design. Quality and reliability. Innovation and creativity.”

This says nothing. Every company claims these values. A Product Vision without opinions is just corporate filler that agents will (correctly) ignore.

The Aspirational Vision

Describing the product you wish you had, not the product you’re building. If your vision says “minimal and focused” but your product has 47 settings screens, agents will be confused by the contradiction.

The Ignored Vision

Creating the document once and never referencing it in specs or prompts. The artifact exists but agents never see it in context.

The Aesthetic-Only Vision

All visual preferences, no product opinion. “We like blue and sans-serif fonts” isn’t vision—it’s a style guide. Vision captures judgment, not just appearance.

Relationship to Other Patterns

Agent Constitution — The Constitution defines behavioral rules (what agents must/must not do). The Vision defines taste (what agents should prefer when rules don’t dictate). Constitution is constraints; Vision is guidance.

The Spec — Specs define feature contracts. The Vision influences how those contracts are fulfilled. Specs reference Vision for design rationale: “Per VISION.md: ‘Settings are failure; good defaults are success.’”

Context Engineering — The Vision is a structured context asset. It follows Context Engineering principles: curated, opinionated, agent-optimized.

Product Vision Authoring — Step-by-step guide for creating and maintaining a Product Vision, including templates, inline vs. separate file decisions, and diagnostic guidance.

AGENTS.md Specification — Defines the file format for agent constitutions, including how to incorporate vision as a preamble or reference.

Living Specs — Specs can reference vision for design rationale. The “same-commit rule” applies: if vision changes, affected specs should acknowledge the shift.

Agent Personas — Different personas may need different vision depth. A copywriting agent needs full voice guidance; a database migration agent needs minimal product context.

See also:

Ralph Loop

Persistence pattern enabling autonomous agent iteration until external verification passes, treating failure as feedback rather than termination.

Status: Live | Last Updated: 2026-01-12

Definition

The Ralph Loop—named by Geoffrey Huntley after the persistently confused but undeterred Simpsons character Ralph Wiggum—is a persistence pattern that turns AI coding agents into autonomous, self-correcting workers.

The pattern operationalizes the OODA Loop for terminal-based agents and automates the Learning Loop with machine-verifiable completion criteria. It enables sustained L3-L4 autonomy—“AFK coding” where the developer initiates and returns to find committed changes.

flowchart LR
    subgraph Input
        PBI["PBI / Spec"]
    end
    
    subgraph "Human-in-the-Loop (L1-L2)"
        DEV["Dev + Copilot"]
        E2E["E2E Tests"]
        DEV --> E2E
    end
    
    subgraph "Ralph Loop (L3-L4)"
        AGENT["Agent Iteration"]
        VERIFY["External Verification"]
        AGENT --> VERIFY
        VERIFY -->|"Fail"| AGENT
    end
    
    subgraph Output
        REVIEW["Adversarial Review"]
        MERGE["Merge"]
        REVIEW --> MERGE
    end
    
    PBI --> DEV
    PBI --> AGENT
    E2E --> REVIEW
    VERIFY -->|"Pass"| REVIEW
Mermaid Diagram

Both lanes start from the same well-structured PBI/Spec and converge at Adversarial Review. The Ralph Loop lane operates autonomously, with human oversight at review boundaries rather than every iteration.

The Problem: Human-in-the-Loop Bottleneck

Traditional AI-assisted development creates a productivity ceiling: the human reviews every output before proceeding. This makes the human the slow component in an otherwise high-speed system.

The naive solution—trusting the agent’s self-assessment—fails because LLMs confidently approve their own broken code. Research demonstrates that self-correction is only reliable with objective external feedback. Without it, the agent becomes a “mimicry engine” that hallucinates success.

AspectTraditional AI InteractionFailure Mode
Execution ModelSingle-pass (one-shot)Limited by human availability
Failure ResponseProcess termination or manual re-promptBlocks on human attention
VerificationHuman review of every outputHuman becomes bottleneck

The Solution: External Verification Loop

The Ralph Loop inverts the quality control model: instead of treating LLM failures as terminal states requiring human intervention, it engineers failure as diagnostic data. The agent iterates until external verification (not self-assessment) confirms success.

Core insight: Define the “finish line” through machine-verifiable tests, then let the agent iterate toward that finish line autonomously. Iteration beats perfection.

AspectTraditional AIRalph Loop
Execution ModelSingle-passContinuous multi-cycle
Failure ResponseManual re-promptAutomatic feedback injection
Persistence LayerContext windowFile system + Git history
VerificationHuman reviewExternal tooling (Docker, Jest, tsc)
ObjectiveImmediate correctnessEventual convergence

Anatomy

1. Stop Hooks and Exit Interception

The agent attempts to exit when it believes it’s done. A Stop hook intercepts the exit and evaluates current state against success criteria. If the agent hasn’t produced a specific “completion promise” (e.g., <promise>DONE</promise>), the hook blocks exit and re-injects the original prompt.

This creates a self-referential loop: the agent confronts its previous work, analyzes why the task remains incomplete, and attempts a new approach.

2. External Verification (Generator/Judge Separation)

The agent is not considered finished when it believes it’s done—only when external verification confirms success:

Evaluation TypeAgent LogicExternal Tooling
Self-Assessment”I believe this is correct”None (Subjective)
External Verification”I will run docker build”Docker Engine (Objective)
Exit DecisionLLM decides to stopSystem stops because tests pass

This is the architectural enforcement of Generator/Judge separation from Adversarial Code Review, but mechanized.

3. Git as Persistent Memory

Context windows rot, but Git history persists. Each iteration commits changes, so subsequent iterations “see” modifications from previous attempts. The codebase becomes the source of truth, not the conversation.

Git also enables easy rollback if an iteration degrades quality.

4. Context Rotation and Progress Files

Context rot: Accumulation of error logs and irrelevant history degrades LLM reasoning.

Solution: At 60-80% context capacity, trigger forced rotation to fresh context. Essential state carries over via structured progress files:

This is the functional equivalent of free() for LLM memory—applied Context Engineering.

5. Convergence Through Iteration

The probability of successful completion P(C) is a function of iterations n:

P(C) = 1 - (1 - p_success)^n

As n increases (often up to 50 iterations), probability of handling complex bugs approaches 1.

OODA Loop Mapping

The Ralph Loop is OODA mechanized:

OODA PhaseRalph Loop Implementation
ObserveRead codebase state, error logs, failed builds
OrientMarshal context, interpret errors, read progress file
DecideFormulate specific plan for next iteration
ActModify files, run tests, commit changes

The cycle repeats until external verification passes.

Relationship to Other Patterns

Context Gates — Context rotation + progress files = state filtering between iterations. Ralph Loops are Context Gates applied to the iteration boundary.

Adversarial Code Review — Ralph architecturally enforces Generator/Judge separation. External tooling is the “Judge” that prevents self-assessment failure.

The Spec — Completion promises require machine-verifiable success criteria. Well-structured Specs with Gherkin scenarios are ideal Ralph inputs.

Workflow as Code — The practice for implementing Ralph Loops using typed step abstractions rather than prompt-based orchestration. Provides deterministic control flow with the agent invoked only for probabilistic tasks.

Anti-Patterns

Anti-PatternDescriptionFailure Mode
Vague Prompts”Improve this codebase” without specific criteriaDivergence; endless superficial changes
No External VerificationRelying on agent self-assessmentSelf-Assessment Trap; hallucinates success
No Iteration CapsRunning without max iterations limitInfinite loops; runaway API costs
No Sandbox IsolationAgent has access to sensitive host filesSecurity breach; SSH keys, cookies exposed
No Context RotationLetting context window fill without rotationContext rot; degraded reasoning
No Progress FilesFresh iterations re-discover completed workWasted tokens; repeated mistakes

Guardrails

RiskMitigation
Infinite LoopingHard iteration caps (20-50 iterations)
Context RotPeriodic rotation at 60-80% capacity
Security BreachSandbox isolation (Docker, WSL)
Token WasteExact completion promise requirements
Logic DriftFrequent Git commits each iteration
Cost OverrunAPI cost tracking per session

See also:

Specs

Living documents that serve as the permanent source of truth for features, solving the context amnesia problem in agentic development.

Status: Live | Last Updated: 2026-01-13

Definition

A Spec is the permanent source of truth for a feature. It defines how the system works (Design) and how we know it works (Quality).

Unlike traditional tech specs or PRDs that are “fire and forget,” specs are living documents. They reside in the repository alongside the code and evolve with every change to the feature.

The Problem: Context Amnesia

Agents do not have long-term memory. They cannot recall Jira tickets from six months ago or Slack conversations about architectural decisions. When an agent is tasked with modifying a feature, it needs immediate access to:

Without specs, agents reverse-engineer intent from code comments and commit messages—a process prone to hallucination and architectural drift.

Traditional documentation fails because:

Specs solve this by making documentation a first-class citizen in the codebase, subject to the same version control and review processes as the code itself.

State vs Delta

This is the core distinction that makes agentic development work at scale.

DimensionThe SpecThe PBI
PurposeDefine the State (how it works)Define the Delta (what changes)
LifespanPermanent (lives with the code)Transient (closed after merge)
ScopeFeature-level rulesTask-level instructions
AudienceArchitects, Agents (Reference)Agents, Developers (Execution)

The Spec defines the current state of the system:

The PBI defines the change:

The PBI references the Spec for context and updates the Spec when it changes contracts.

Why Separation Matters

Sprint 1: PBI-101 "Build notification system"
  → Creates /plans/notifications/spec.md
  → Spec defines: "Deliver within 100ms via WebSocket"

Sprint 3: PBI-203 "Add SMS fallback"
  → Updates spec.md with new transport rules
  → PBI-203 is closed, but the spec persists

Sprint 8: PBI-420 "Refactor notification queue"
  → Agent reads spec.md, sees all rules still apply
  → Refactoring preserves all documented contracts

Without this separation, the agent in Sprint 8 has no visibility into decisions made in Sprint 1.

The Assembly Model

Specs serve as the context source for Feature Assembly. Multiple PBIs reference the same spec, and the spec’s contracts are verified at quality gates.

flowchart LR
  A[/spec.md/]

  B[\pbi-101.md\]
  C[\pbi-203.md\]
  D[\pbi-420.md\]

  B1[[FEATURE ASSEMBLY]]
  C1[[FEATURE ASSEMBLY]]
  D1[[FEATURE ASSEMBLY]]

  E{GATE}

  F[[MIGRATION]]

  A --> B
  A --> C
  A --> D

  B --> B1
  C --> C1
  D --> D1

  B1 --> E
  C1 --> E
  D1 --> E

  A --> |Context|E

  E --> F
Mermaid Diagram

Anatomy

Every spec consists of two parts:

Blueprint (Design)

Defines implementation constraints that prevent agents from hallucinating invalid architectures.

Contract (Quality)

Defines verification rules that exist independently of any specific task.

The Contract section implements Behavior-Driven Development principles: scenarios define what behavior is expected without dictating how to implement it. This allows agents to interpret intent dynamically while providing clear verification criteria.

For detailed structure, examples, and templates, see the Living Specs Practice Guide.

Relationship to Other Patterns

The PBI — PBIs are the transient execution units (Delta) that reference specs for context. When a PBI changes contracts, it updates the spec in the same commit.

Feature Assembly — Specs define the acceptance criteria verified during assembly. The diagram above shows this flow.

Experience Modeling — Experience models capture user journeys; specs capture the technical contracts that implement those journeys.

Context Engineering — Specs are structured context assets optimized for agent consumption, with predictable sections (Blueprint, Contract) for efficient extraction.

Behavior-Driven Development — BDD provides the methodology for the Contract section. Gherkin scenarios serve as “specifications of behavior” that guide agent reasoning and define acceptance criteria.

Iterative Spec Refinement

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” This is valid—specs are not waterfall artifacts.

The refinement cycle:

  1. Initial Spec — Capture known constraints (API contracts, quality targets, anti-patterns)
  2. Implementation Discovery — Agent or human encounters edge cases, performance issues, or missing requirements
  3. Spec Update — New constraints committed alongside the code that revealed them
  4. Verification — Gate validates implementation against updated spec
  5. Repeat

This is the Learning Loop applied to specs: the spec doesn’t prevent learning—it captures learnings so agents can act on them in future sessions.

“Large Language Models give us great leverage—but they only work if we focus on learning and understanding.” — Unmesh Joshi, via Martin Fowler

Industry Validation

The Spec pattern has emerged independently across the industry under different names. Notably, Rasmus Widing’s Product Requirement Prompt (PRP) methodology defines the same structure: Goal + Why + Success Criteria + Context + Implementation Blueprint + Validation Loop.

His core principles—“Plan before you prompt,” “Context is everything,” “Scope to what the model can reliably do”—mirror ASDLC’s Spec-Driven Development philosophy.

See Product Requirement Prompts for the full mapping and Industry Alignment for convergent frameworks.

See also:

The PBI

A transient execution unit that defines the delta (change) while pointing to permanent context (The Spec), optimized for agent consumption.

Status: Live | Last Updated: 2026-01-13

Definition

The Product Backlog Item (PBI) is the unit of execution in the ASDLC. While The Spec defines the State (how the system works), the PBI defines the Delta (the specific change to be made).

In an AI-native workflow, the PBI transforms from a “User Story” (negotiable conversation) into a Prompt (strict directive). The AI has flexibility in how code is written, but the PBI enforces strict boundaries on what is delivered.

The Problem: Ambiguous Work Items

Traditional user stories (“As a user, I want…”) are designed for human negotiation. They assume ongoing dialogue, implicit context, and shared understanding built over time.

Agents don’t negotiate. They execute. A vague story becomes a hallucinated implementation.

What fails without structured PBIs:

The Solution: Pointer, Not Container

The PBI acts as a pointer to permanent context, not a container for the full design. It defines the delta while referencing The Spec for the state.

DimensionThe SpecThe PBI
PurposeDefine the State (how it works)Define the Delta (what changes)
LifespanPermanent (lives with the code)Transient (closed after merge)
ScopeFeature-level rulesTask-level instructions
AudienceArchitects, Agents (Reference)Agents, Developers (Execution)

Anatomy

An effective PBI consists of four parts:

1. The Directive

What to do, with explicit scope boundaries. Not a request—a constrained instruction.

2. The Context Pointer

Reference to the permanent spec. Prevents the PBI from becoming a stale copy of design decisions that live elsewhere.

3. The Verification Pointer

Link to success criteria defined in the spec’s Contract section. The agent knows exactly what “done” looks like.

4. The Refinement Rule

Protocol for when reality diverges from the spec. Does the agent stop? Update the spec? Flag for human review?

Bounded Agency

Because AI is probabilistic, it requires freedom to explore the “How” (implementation details, syntax choices). However, to prevent hallucination, we bound this freedom with non-negotiable constraints.

Negotiable (The Path): Code structure, variable naming, internal logic flow, refactoring approaches.

Non-Negotiable (The Guardrails): Steps defined in the PBI, outcome metrics in the Spec, documented anti-patterns, architectural boundaries.

The PBI is not a request for conversation—it’s a constrained optimization problem.

Atomicity & Concurrency

In swarm execution (multiple agents working in parallel), each PBI must be:

Atomic: The PBI delivers a complete, working increment. No partial states. If the agent stops mid-task, either the full change lands or nothing does.

Self-Testable: Verification criteria must be executable without other pending PBIs completing first. If PBI-102 requires PBI-101’s code to test, PBI-102 is not self-testable.

Isolated: Changes target distinct files/modules. Two concurrent PBIs modifying the same file create merge conflicts and non-deterministic outcomes.

Dependency Declaration

When a PBI requires another to complete first, the dependency is declared explicitly in the PBI structure—not discovered at merge time.

Relationship to Other Patterns

The Spec — The permanent source of truth that PBIs reference. The Spec defines state; the PBI defines delta.

PBI Authoring — The practice for writing effective PBIs, including templates and lifecycle.

See also:

Practices (A-Z)

Agent Personas

A guide on how to add multiple personas to an AGENTS.md file, with examples.

Status: Live | Last Updated: 2026-01-13

Definition

Defining clear personas for your agents is crucial for ensuring they understand their role, trigger constraints, and goals. This guide demonstrates how to structure multiple personas within your AGENTS.md file.

Personas are a context engineering practice—they scope agent work by defining boundaries and focus, not by role-playing. When combined with Model Routing, personas can also specify which computational tool (LLM) to use for each type of work.

For the full specification of the AGENTS.md file, see the AGENTS.md Specification.

When to Use

Use this practice when:

Skip this practice when:

How to Add Multiple Personas

You can define multiple personas by specifying triggers, goals, and guidelines for each. This allows different agents (or the same agent in different contexts) to adopt specific behaviors suited for the task at hand.

Example: Our Internal Personas

Below are the personas we use, serving as a template for your own AGENTS.md.

### 1.1. Lead Developer / Astro Architect (@Lead)
**Trigger:** When asked about system design, specs, or planning.
* **Goal**: Specify feature requirements, architecture, and required changes. Analyze the project state and plan next steps.
* **Guidelines**
  - **Schema Design:** When creating new content types, immediately define the Zod schema in `src/content/config.ts`.
  - **Routing:** Use Astro's file-based routing. For dynamic docs, use `[...slug].astro` and `getStaticPaths()`.
  - **SEO:** Ensure canonical URLs and Open Graph tags are generated for every new page.
  - **Dev Performace:** Focus on tangible, deliverable outcomes.
  - **Spec driven development:** Always produce clear, concise specifications before handing off to implementation agents.
  - **Planned iterations:** Break down large tasks into manageable PBIs with clear acceptance criteria.

### 1.2. Designer / User Experience Lead (@Designer)
**Trigger:** When asked about Design system UI/UX, design systems, or visual consistency.
* **Goal**: Ensure the design system can be effectively utilized by agents and humans alike.
* **Guidelines**
  - **Design Tokens:** Tokens must be set in `src/styles/tokens.css`. No hardcoded colors or fonts.
  - **Component Consistency:** All components must adhere to the design system documented in `src/pages/resources/design-system.astro`. 
  - **Accessibility:** Ensure all components meet WCAG 2.1 AA standards.
  - **Documentation:** Update the Design System page with any new components or styles introduced.
  - **Experience Modeling Allowed:** Design system components are protected by a commit rule: use \[EM] tag to override the rule.
  
### 1.3. Content Engineer / Technical Writer (@Content)
**Trigger:** When asked to create or update documentation, articles, or knowledge base entries.
* **Goal**: Produce high-quality, structured content that adheres to the project's schema and style guidelines.
* **Guidelines**
  - **Content Structure:** Follow the established folder structure in `src/content/` for concepts
  
### 1.4. Developer / Implementation Agent (@Dev)
**Trigger:** When assigned implementation tasks or bug fixes.
* **Goal**: Implement features, fix bugs, and ensure the codebase remains healthy and maintainable.
* **Guidelines**
  - **Expect PBIs:** Always work from a defined Product Backlog Item (PBI) with clear acceptance criteria, if available.
  - **Type Safety:** Use TypeScript strictly. No `any` types allowed.
  - **Component Imports:** Explicitly import all components used in `.astro` files.
  - **Testing:** Ensure all changes pass `pnpm check` and `pnpm lint`
  - **Document progress:** Update the relevant PBI in `docs/backlog/` with status and notes.md after completing tasks.

Model Routing and Personas

Personas define what work to do and how to scope it. Model Routing is a separate practice that defines which computational tool to use.

Current State (December 2025)

AI-assisted IDEs (Cursor, Windsurf, Claude Code) do not automatically select models based on persona definitions. Model selection is manual.

Best Practice: Keep Them Separate

Don’t add model profiles to AGENTS.md - It adds noise to the context window without providing automation value.

Instead:

  1. Keep personas focused on triggers, goals, and guidelines
  2. Use Model Routing separately - Manually select models based on the task characteristics
  3. Reference the pattern when deciding which model to use

Matching Personas to Model Profiles

When you invoke a persona, choose your model based on the work type:

Persona TypeTypical WorkRecommended Profile
Lead / ArchitectSystem design, specs, architectural decisionsHigh Reasoning
Developer / ImplementationCode generation, refactoring, testsHigh Throughput
Documentation AnalystLegacy code analysis, comprehensive docsMassive Context

The workflow:

  1. Identify the persona needed for your task
  2. Select the appropriate model manually in your IDE
  3. Invoke the persona with your prompt

This keeps AGENTS.md lean and focused on scoping agent work, while model selection remains a deliberate engineering decision.

AGENTS.md Specification

The definitive guide to the AGENTS.md file, including philosophy, anatomy, and implementation strategy.

Status: Live | Last Updated: 2026-01-13

DEFINITION

AGENTS.md is an open format for guiding coding agents, acting as a “README for agents.” It provides a dedicated, predictable place for context and instructions—such as build steps, tests, and conventions—that help AI coding agents work effectively on a project.

We align with the agents.md specification, treating this file as the authoritative source of truth for agentic behavior within the ASDLC.

When to Use

Use this practice when:

Skip this practice when:

CORE PHILOSOPHY

1. A README for Agents

Just as README.md is for humans, AGENTS.md is for agents. It complements existing documentation by containing the detailed context—build commands, strict style guides, and test instructions—that agents need but might clutter a human-facing README.

2. Context is Code

In the ASDLC, we treat AGENTS.md with the same rigor as production software:

3. The Context Anchor (Long-Term Memory)

AGENTS.md solves the “Context Amnesia” problem. Agents are stateless—each new session starts with blank context. Without grounding, the agent reverts to generic training weights, forgetting project-specific patterns and lessons learned.

The AGENTS.md file acts as persistent “standing orders” for the agent across different sessions. By documenting your research tools, coding styles, architectural decisions, and accumulated lessons here, you prevent session-to-session drift.

This transforms AGENTS.md from a simple configuration file into the project’s institutional memory for AI collaboration.

Format Philosophy

The structures in this specification (YAML maps, XML standards, tiered boundaries) are optimized for large teams and complex codebases. For smaller projects:

The goal is signal density, not format compliance. Overly rigid specs create adoption friction. Let teams scale complexity to their needs.

TOOL-SPECIFIC CONSIDERATIONS

Different AI coding tools look for different filenames. While AGENTS.md is the emerging standard, some tools require specific naming:

ToolExpected FilenameNotes
Cursor.cursorrulesAlso reads AGENTS.md
Windsurf.windsurfrulesAlso reads AGENTS.md
Claude CodeCLAUDE.mdDoes not read AGENTS.md; case-sensitive
CodexAGENTS.mdNative support
Zed.rulesPriority-based; reads AGENTS.md at lower priority
VS Code / CopilotAGENTS.mdRequires chat.useAgentsMdFile setting enabled

Zed Priority Order

Zed uses the first matching file from this list:

  1. .rules
  2. .cursorrules
  3. .windsurfrules
  4. .clinerules
  5. .github/copilot-instructions.md
  6. AGENT.md
  7. AGENTS.md
  8. CLAUDE.md
  9. GEMINI.md

VS Code Configuration

VS Code requires explicit opt-in for AGENTS.md support:

Recommendation

Create a symlink to support Claude Code without duplicating content:

ln -s AGENTS.md CLAUDE.md

This ensures Claude Code users get the same guidance while maintaining a single source of truth. Note that Claude Code also supports CLAUDE.local.md for personal preferences that shouldn’t be version-controlled.

ECOSYSTEM TOOLS

As AGENTS.md adoption grows, tools emerge to bridge compatibility gaps between different coding assistants and enforce standards across heterogeneous environments.

Ruler

Ruler is a meta-tool that synthesizes agent instructions from multiple sources (AGENTS.md, .cursorrules, project conventions) and injects them into coding assistants that may not natively support the AGENTS.md standard.

Key capabilities:

Use case: Teams using multiple coding assistants (e.g., some developers on Cursor, others on Claude Code) can maintain a single source of truth in AGENTS.md while Ruler handles distribution to tool-specific formats.

This demonstrates ecosystem maturity: when third-party tools emerge to solve interoperability problems, the standard has achieved meaningful adoption.

ASDLC IMPLEMENTATION STRATEGY

While the agents.md standard provides the format, the ASDLC recommends a structured implementation to ensure reliability. We present our AGENTS.md format not just as a list of tips, but as a segmented database of rules. This is one valid implementation strategy, particularly suited for rigorous engineering environments.

1. Identity Anchoring (The Persona)

Establishes the specific expertise required to prune the model’s search space. Without this, the model reverts to the “average” developer found in its training data. For detailed examples of defining multiple personas, see Agent Personas.

Bad: “You are a coding assistant.”

Good: “You are a Principal Systems Engineer specializing in Go 1.22, gRPC, and high-throughput concurrency patterns. You favor composition over inheritance.”

2. Contextual Alignment (The Mission)

A concise, high-level summary of the project’s purpose and business domain. This is often formatted as a blockquote at the top of the file to “set the stage” for the agent’s session.

Example:

Project: “ZenTask” - A minimalist productivity app. Core Philosophy: Local-first data architecture; offline support is mandatory.

3. Operational Grounding (The Tech Stack)

Explicitly defines the software environment to prevent “Library Hallucination.” This section must be exhaustive regarding key dependencies and restrictive regarding alternatives.

4. Behavioral Boundaries (Context Gates)

Replaces vague “Guardrails” with a “Three-Tiered Boundary” system, forming the Agent Constitution. As the models are probabilistic, absolute prohibitions are unrealistic. Instead, this system categorizes rules by severity and required action. These rules are aimed to reducing the likelihood of critical errors. Note that you should always complement the constitution with explicit and deterministic quality gates enforced by tests, linters, and CI/CD pipelines.

Tier 1 (Constitutive - ALWAYS): Non-negotiable standards.

Example: “Always add JSDoc to exported functions.”

Tier 2 (Procedural - ASK): High-risk operations requiring Human-in-the-Loop.

Example: “Ask before running database migrations or deleting files.”

Tier 3 (Hard Constraints - NEVER): Safety limits.

Example: “Never commit secrets, API keys, or .env files.”

5. Semantic Directory Mapping

When documenting the codebase structure in AGENTS.md, prefer Annotated YAML over ASCII trees.

Example:

directory_map:
  src:
    # Core Application Logic
    main.py: "Application entry point; initializes the Agent Orchestrator"
    
    agents:
      # Individual Agent definitions
      base_agent.py: "Abstract base class defining the 'step()' and 'memory' interfaces"
    
    utils:
      # Shared libraries
      llm_client.py: "Wrapper for OpenAI/Anthropic APIs with retry logic"

6. The Command Registry

A lookup table mapping intent to execution. Agents often default to standard commands (npm test) which may fail in custom environments (make test-unit). The Registry forces specific tool usage.

IntentCommandNotes
Buildpnpm buildOutputs to dist/
Testpnpm test:unitFlags: —watch=false
Lintpnpm lint —fixSelf-correction enabled

7. Implementation notes

XML-Tagging for Semantic Parsing

To maximize adherence, use pseudo-XML tags to encapsulate rules. This creates a “schema” that the model can parse more strictly than bullet points.

<coding_standard name="React Hooks">
  <instruction>
    Use functional components and Hooks. Avoid Class components.
    Ensure extensive use of custom hooks for logic reuse.
  </instruction>
  <anti_pattern>
    class MyComponent extends React.Component {... }
  </anti_pattern>
  <preferred_pattern>
    const MyComponent = () => {... }
  </preferred_pattern>
</coding_standard>

REFERENCE TEMPLATE

Filename: AGENTS.md

# AGENTS.md - Context & Rules for AI Agents

> **Project Mission:** High-throughput gRPC service for processing real-time financial transactions.
> **Core Constraints:** Zero-trust security model, ACID compliance required for all writes.

## 1. Identity & Persona
- **Role:** Senior Systems Engineer
- **Specialization:** High-throughput distributed systems in Go.
- **Objective:** Write performant, thread-safe, and maintainable code.

## 2. Tech Stack (Ground Truth)
- **Language:** Go 1.22 (Use `iter` package for loops)
- **Transport:** gRPC (Protobuf v3)
- **Database:** PostgreSQL 15 with `pgx` driver (No ORM allowed)
- **Infra:** Kubernetes, Helm, Docker

## 3. Operational Boundaries (CRITICAL)
- **NEVER** commit secrets, tokens, or `.env` files.
- **NEVER** modify `api/proto` without running `buf generate`.
- **ALWAYS** handle errors; never use `_` to ignore errors.
- **ASK** before adding external dependencies.

## 4. Command Registry
| Action | Command | Note |
| :--- | :--- | :--- |
| **Build** | `make build` | Outputs to `./bin` |
| **Test** | `make test` | Runs with `-race` detector |
| **Lint** | `golangci-lint run` | Must pass before commit |
| **Gen** | `make proto` | Regenerates gRPC stubs |

## 5. Development Map
```yaml
directory_map:
  api:
    proto: "Protocol Buffers definitions (Source of Truth)"
  cmd:
    server: "Main entry point, dependency injection wire-up"
  internal:
    biz: "Business logic and domain entities (Pure Go)"
    data: "Data access layer (Postgres + pgx)"

6. Coding Standards

<rule_set name="Concurrency">
  <instruction>Use `errgroup` for managing goroutines. Avoid bare `go` routines.</instruction>
  <example>
    <bad>go func() {... }()</bad>
    <good>g.Go(func() error {... })</good>
  </example>
</rule_set>

7. Context References

Constitutional Review Implementation

Step-by-step guide for implementing Constitutional Review to validate code against both Spec and Constitution contracts.

Status: Experimental | Last Updated: 2026-01-08

Definition

Constitutional Review Implementation is the operational practice of configuring and executing Constitutional Review to validate code against both functional requirements (the Spec) and architectural values (the Constitution).

This practice extends Adversarial Code Review by adding constitutional constraints to the Critic Agent’s validation criteria.

When to Use

Use this practice when:

Skip this practice when:

Prerequisites

Before implementing Constitutional Review, ensure you have:

  1. Agent Constitution documented (typically AGENTS.md)
  2. The Spec for the feature being reviewed
  3. Critic Agent session separate from the Builder Agent (fresh context)
  4. Architectural constraints clearly defined in the Constitution

If architectural constraints aren’t documented, start with AGENTS.md Specification.

Process

Step 1: Document Architectural Constraints in Constitution

Ensure your Agent Constitution includes non-functional constraints that are:

Example Structure:

## Architectural Constraints

### Data Access
- All filtering operations MUST be pushed to the database layer
- Never use `findAll()` or `LoadAll()` followed by in-memory filtering
- Queries must handle 10k+ records without memory issues

### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- No N+1 query patterns

### Security
- User IDs never logged (use hashed identifiers)
- All inputs validated against Zod schemas before processing
- Authentication tokens expire within 24 hours
- No hardcoded secrets (use environment variables)

### Error Handling
- Never fail silently (all errors logged with context)
- User-facing errors never expose stack traces
- Database errors map to generic "Service unavailable" messages

Step 2: Configure Critic Agent Prompt

Extend the standard Adversarial Code Review prompt to include constitutional validation.

System Prompt Template:

You are a rigorous Code Reviewer validating implementation against TWO sources of truth:

1. The Spec (/plans/{feature-name}/spec.md)
   - Functional requirements (what should it do?)
   - API contracts (what are the inputs/outputs?)
   - Data schemas (what is the structure?)

2. The Constitution (AGENTS.md)
   - Architectural patterns (e.g., "push filtering to DB")
   - Performance constraints (e.g., "queries handle 10k+ records")
   - Security rules (e.g., "never log user IDs")
   - Error handling policies (e.g., "never fail silently")

YOUR JOB:
Identify where code satisfies the Spec (functional) but violates the Constitution (architectural).

COMMON CONSTITUTIONAL VIOLATIONS TO CHECK:
- LoadAll().Filter() pattern (data access violation)
- Hardcoded secrets (security violation)
- Missing error logging (error handling violation)
- N+1 query patterns (performance violation)
- User IDs in logs (security violation)

OUTPUT FORMAT:
For each violation:
1. Type: Constitutional Violation - [Category]
2. Location: File path and line number
3. Issue: What constitutional principle is violated
4. Impact: Why this matters at scale (performance, security, maintainability)
5. Remediation Path: Ordered steps to fix (prefer standard patterns, escalate if needed)
6. Test Requirements: What tests would prevent regression

If no violations found, output: PASS - Constitutional Review

Step 3: Execute Constitutional Review Workflow

Follow this sequence to ensure proper validation:

┌─────────────┐
│   Builder   │ → Implements Spec
└──────┬──────┘
       ↓
┌─────────────────┐
│  Quality Gates  │ → Tests, types, linting (deterministic)
└──────┬──────────┘
       ↓ (pass)
┌──────────────────┐
│ Spec Compliance  │ → Does it meet functional requirements?
│     Review       │    (Adversarial Code Review)
└──────┬───────────┘
       ↓ (pass)
┌──────────────────┐
│ Constitutional   │ → Does it follow architectural principles?
│     Review       │    (This practice)
└──────┬───────────┘
       ↓ (pass)
┌─────────────────┐
│ Acceptance Gate │ → Human strategic review (is it the right thing?)
└─────────────────┘

Execution Steps:

  1. Builder completes implementation — Code written, tests pass
  2. Quality Gates pass — Compilation, linting, unit tests all green
  3. Spec Compliance Review — Critic validates functional requirements met
  4. ⭐ Constitutional Review — Critic validates architectural principles followed:
    • Open new Critic Agent session (fresh context, no Builder bias)
    • Provide Constitution (AGENTS.md)
    • Provide Spec (feature spec file)
    • Provide Code Diff (changed files only)
    • Use Constitutional Review prompt (from Step 2)
    • Critic outputs violations or PASS
  5. If violations found → Return to Builder with remediation path
  6. If PASS → Proceed to Acceptance Gate (human review)

Step 4: Process Violation Reports

When the Critic identifies constitutional violations, the output will follow this format:

VIOLATION: Constitutional - Data Access Pattern

Location: src/audit/AuditService.cs Line 23

Issue: Loads all records into memory before filtering
Constitution Violation: "All filtering operations MUST be pushed to database layer"

Impact: 
- Works fine with small datasets (< 1k records)
- Breaks at scale (10k+ records cause memory issues)
- Creates N+1 query patterns in related queries
- Violates performance SLA (API responses > 200ms)

Remediation Path:
1. Push filter to database query:
   repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this pattern, use raw SQL:
   SELECT * FROM audit_logs WHERE date > @startDate
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in repository interface comments

Test Requirements:
- Add test: "GetLogs with 10k records completes in < 200ms"
- Add test: "GetLogs does not load entire table into memory"
  (mock repository, verify FindWhere called, not LoadAll)

Processing steps:

  1. Return to Builder Agent with full violation report
  2. Builder implements remediation following the ordered path
  3. Re-run Constitutional Review after fixes
  4. Iterate until PASS (typically 1-2 cycles)

Step 5: Update Constitution Based on Violations

If the Critic struggles to validate or produces unclear violations:

Indicators Constitution needs update:

Action: Refine constitutional constraints:

## Before (vague)
### Performance
- Code should be fast

## After (specific)
### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- Push filtering to database layer (never LoadAll().Filter())

Examples

Example 1: Data Access Violation

Code Submitted:

async function getActiveUsers() {
  const users = await db.users.findAll();
  return users.filter(u => u.status === 'active');
}

Constitutional Review Output:

VIOLATION: Constitutional - Data Access Pattern

Location: src/users/service.ts Line 12

Issue: Loads all users then filters in-memory
Constitution Violation: "Push filtering to database layer"

Impact: Works with 100 users, breaks at 10k+

Remediation:
1. db.users.findWhere({ status: 'active' })
2. Add test with 10k users to verify performance

Test Requirements:
- Performance test: 10k users returns in < 200ms
- Verify db.users.findWhere called (not findAll)

Example 2: Security Violation

Code Submitted:

logger.info(`User ${userId} logged in from ${ipAddress}`);

Constitutional Review Output:

VIOLATION: Constitutional - Security

Location: src/auth/logger.ts Line 45

Issue: Logs user ID directly
Constitution Violation: "Never log user IDs (use hashed identifiers)"

Impact: GDPR compliance risk, audit log exposure

Remediation:
1. Hash user ID: logger.info(`User ${hashUserId(userId)} logged in...`)
2. Implement hashUserId utility (SHA-256 with salt)
3. Update all logging to use hashed IDs

Test Requirements:
- Verify logs do not contain raw user IDs
- Verify hashed IDs are consistent (same user = same hash)

Implementation Constraints

Requires Clear Constitutional Principles — Vague constraints produce vague critiques. “Be performant” is not actionable. “API responses < 200ms at p99” is.

Not Fully Automated (Yet) — As of January 2026, requires manual orchestration. You must manually:

Model Capability Variance — Not all reasoning models perform equally at constitutional review. Recommended:

False Positives Possible — Architectural rules have exceptions. The Critic may flag valid code that violates general principles for good reasons. Human review in Acceptance Gate remains essential.

Context Window Limits — Large diffs may exceed context windows. Solutions:

Troubleshooting

Issue: Critic approves code that violates Constitution

Cause: Constitutional constraints not specific enough in AGENTS.md

Solution:

  1. Review violation that slipped through
  2. Add specific constraint to Constitution:
    ### Data Access
    - ❌ Before: "Queries should be efficient"
    - ✅ After: "Never use LoadAll().Filter() - push filtering to database"
    
  3. Re-run Constitutional Review with updated Constitution

Issue: Critic flags valid code as violation

Cause: Constitutional rule is too strict or lacks exceptions

Solution:

  1. Document exception in Constitution:
    ### Data Access
    - Push filtering to database layer
    - Exception: In-memory filtering allowed for cached reference data (< 100 records)
    
  2. Update Critic prompt to recognize exceptions
  3. Proceed to Acceptance Gate (human validates exception is legitimate)

Issue: Constitutional Review takes too long

Cause: Large code diffs or complex Constitution

Solution:

  1. Break up PRs — Smaller, focused changes
  2. Parallelize reviews — Review multiple files concurrently
  3. Use Summary Gates — Compress Spec to relevant sections only
  4. Cache Constitution — Reuse constitutional context across reviews

Future Automation Potential

This practice is currently manual but has clear automation paths:

CI/CD Integration — Automated constitutional review on PR creation:

# .github/workflows/constitutional-review.yml
on: pull_request
jobs:
  constitutional-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run Constitutional Review
        run: |
          constitutional-review-agent \
            --constitution AGENTS.md \
            --spec plans/${FEATURE}/spec.md \
            --diff ${{ github.event.pull_request.diff_url }}

IDE Integration — Real-time constitutional feedback:

Living Constitution — Automatic updates:

Violation Analytics — Dashboard tracking:

See also:

External Validation

Feature Assembly

The implementation phase where PBIs are executed against Specs, validated through quality gates, and integrated into the codebase.

Status: Draft | Last Updated: 2026-01-09

Definition

Feature Assembly is the implementation phase in the Agentic SDLC where PBIs (Product Backlog Items) are executed by agents or developers using The Spec as the authoritative source of truth. Unlike traditional development where implementation details drift from requirements, Feature Assembly enforces strict contract validation through Context Gates before code enters the codebase.

This is where the “Delta” (PBI) meets the “State” (Spec), and the output is verified code that provably satisfies documented contracts.

The Assembly Pipeline

flowchart LR
  PBI[PBI] --> Spec[Spec]
  Spec --> Code[Implementation]
  Code --> Gates{Gates}
  Gates -->|PASS| Merge[Merge]
  Gates -->|FAIL| Code
  Merge --> Done([Done])
Mermaid Diagram

The Problem: Implementation Drift

Traditional development workflows suffer from a disconnect between specification and implementation:

Spec-less Coding — Developers implement features based on vague tickets, Slack discussions, or tribal knowledge, leading to inconsistent interpretations.

Post-Hoc Documentation — Documentation is written after implementation (if at all), capturing what was built rather than what was intended.

Silent Contract Violations — Code that “works” but violates architectural constraints, performance requirements, or edge case handling goes undetected until production.

Agent Hallucination — LLM-generated code drifts toward “average solutions” found in training data, ignoring project-specific constraints.

The Solution: Spec-Driven Assembly

Feature Assembly inverts the traditional workflow:

  1. The Spec is written firstSpecs define contracts before any code is written
  2. PBIs reference the SpecPBIs point to spec sections rather than duplicating requirements
  3. Implementation is validated against contracts — Code must pass quality gates that verify spec compliance
  4. Gates block invalid code — Failed validation prevents merge, forcing correction

This creates a closed loop where the Spec is both the input (what to build) and the acceptance criteria (how to verify).

The Assembly Workflow

Phase 1: Context Loading

The agent or developer begins by loading the necessary context:

Input:

Example:

# PBI-427: Implement notification preferences API
# Context:
#   - Spec: /plans/notifications/spec.md
#   - Scope: src/api/notifications/

Phase 1a: Plan Verification

Before implementation begins, verify the agent’s proposed execution plan.

Modern coding agents generate an internal execution plan before writing code. This plan must be reviewed—blind approval is a common failure mode.

The Vibe Check:

The Logic Check:

The Observability First Rule:

[!IMPORTANT] If the plan is vague (“I will fix the bug”), reject it. Demand a specific file-level plan before execution proceeds.

Phase 2: Implementation

Code is generated or written to satisfy the PBI requirements while adhering to spec contracts.

Key Principles:

Anti-Pattern:

// ❌ Implementing without reading the spec
async function updatePreferences(data: any) {
  await db.save(data); // No validation, ignores spec contracts
}

Correct Pattern:

// ✅ Implementing against spec contracts
// See: /plans/notifications/spec.md#data-schema
import { PreferencesSchema } from './schemas';

async function updatePreferences(data: unknown) {
  // Spec requirement: validate input
  const validated = PreferencesSchema.parse(data);
  
  // Spec requirement: latency < 200ms
  const result = await db.save(validated);
  
  return result;
}

Phase 3: Quality Gates

Before code can be merged, it must pass through a three-tier validation system defined in Context Gates:

Quality Gates (Deterministic - Required)

Automated, binary checks enforced by tooling:

Implementation:

# .github/workflows/quality-gate.yml
name: Quality Gate
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Compile
        run: npm run build
      - name: Lint
        run: npm run lint
      - name: Test
        run: npm run test
      - name: Type Check
        run: npm run type-check

LLM-assisted validation using a Critic Agent (see Adversarial Code Review):

Implementation:

# Run Critic Agent in fresh session
critic-agent review \
  --spec plans/notifications/spec.md \
  --diff src/api/notifications/preferences.ts \
  --output violations.json

Acceptance Gates (Human-in-the-Loop - Required)

Subjective validation requiring human judgment:

Workflow:

Phase 4: Integration

Once all gates pass, code is merged and the Spec is updated if contracts changed.

Merge Checklist:

Relationship to Experience Modeling

Experience Modeling defines the Design System that Feature Assembly consumes.

Key Constraint: During Feature Assembly, the Design System is read-only. Feature agents use design system components but cannot modify them.

Example Gate:

// pre-commit hook
const changedFiles = getChangedFiles();
const designSystemFiles = changedFiles.filter(f => 
  f.startsWith('src/design-system/')
);

if (designSystemFiles.length > 0 && !isExperienceModelingMode()) {
  console.error('❌ Design system is read-only during Feature Assembly');
  console.error('   Use [EM] commit tag to override (requires approval)');
  process.exit(1);
}

This prevents “Design Drift” where features gradually corrupt the design system.

Relationship to The Spec

Feature Assembly is the execution of the Spec’s contracts.

Spec SectionAssembly Verification
Blueprint → ArchitectureCode structure matches defined patterns
Blueprint → Anti-PatternsLinting/review catches violations
Contract → Definition of DoneAll checklist items satisfied
Contract → Regression GuardrailsTests verify invariants hold
Contract → ScenariosE2E tests implement Gherkin scenarios

The Spec is the test oracle — if the Spec says latency must be <200ms, the quality gate verifies it.

Relationship to The PBI

PBIs are the transient execution triggers. Feature Assembly is what happens when a PBI is executed.

The Flow:

PBI → Feature Assembly → Quality Gates → Integration → Spec Update (if needed)

Example:

  1. PBI-427 says “Implement preferences API”
  2. Assembly phase builds src/api/preferences.ts
  3. Quality gates verify against /plans/notifications/spec.md
  4. Human accepts strategic fit
  5. Code merges, PBI-427 closes

Implementation Guidelines

Test-First Assembly

Write tests that verify spec contracts before implementing:

// tests/api/preferences.test.ts
// Validates: /plans/notifications/spec.md#contract

describe('Preferences API', () => {
  it('should respond within 200ms', async () => {
    const start = Date.now();
    await updatePreferences(mockData);
    const duration = Date.now() - start;
    expect(duration).toBeLessThan(200);
  });

  it('should reject invalid schema', async () => {
    await expect(
      updatePreferences({ invalid: 'data' })
    ).rejects.toThrow(ValidationError);
  });
});

Spec-Contract Mapping

Link code to spec sections explicitly:

// src/api/notifications/preferences.ts
// Spec: /plans/notifications/spec.md
// Contract: "All updates must validate against PreferencesSchema"
// Contract: "Response time must be <200ms"

export async function updatePreferences(data: unknown) {
  // Implements spec contracts...
}

Continuous Verification

Run quality gates on every commit:

# .git/hooks/pre-commit
#!/bin/bash
npm run lint || exit 1
npm run test || exit 1
npm run type-check || exit 1
echo "✅ Quality gates passed"

Best Practices

1. Never Skip Gates

All code must pass quality gates. No “we’ll fix it later” exceptions.

2. Spec First, Code Second

If the Spec is unclear, update the Spec before implementing. Don’t guess.

3. Atomic Assembly

Complete one PBI fully before starting another. Partial implementations create context pollution.

4. Document Deviations

If implementation requires deviating from the Spec, update the Spec in the same commit with a changelog entry.

5. Automate Gates

Quality gates should run automatically on CI/CD. Manual gates introduce inconsistency.

Anti-Patterns

The “Works on My Machine” Merge

Problem: Code passes local tests but fails in CI/production.

Solution: Require all quality gates to pass in CI before merge is allowed.

The Spec Drift

Problem: Code is implemented without reading the Spec, causing contract violations.

Solution: Code review checklist requires explicit spec section references.

The Post-Hoc Documentation

Problem: Spec is updated after code is written, documenting what was built rather than what was intended.

Solution: Spec reviews happen before PBI creation. No PBI without a Spec.

The Eternal WIP

Problem: PBIs remain “in progress” for weeks, accumulating scope creep.

Solution: Time-box PBIs. If not done in 1-2 days, break into smaller PBIs.

Metrics and Observability

Track assembly health with these metrics:

Gate Pass Rate:

Cycle Time:

Spec Coverage:

Rework Rate:

Future Enhancements

This practice is currently manual orchestration. Automation opportunities:

Auto-Gate Runners — CI/CD automatically runs critic agents and posts violations as PR comments

Spec-to-Test Generation — LLMs generate test cases from Spec’s Contract section

Real-Time Compliance — IDE plugin shows spec violations as code is written

Assembly Metrics Dashboard — Real-time tracking of gate pass rates and cycle time

See also:

Living Specs

Practical guide to creating and maintaining specs that evolve alongside your codebase.

Status: Experimental | Last Updated: 2025-12-22

Overview

This guide provides practical instructions for implementing the Specs pattern. While the pattern describes what specs are and why they matter, this guide focuses on how to create and maintain them.

When to Create a Spec

Create a spec when starting work that involves:

Feature Domains — New functionality that introduces architectural patterns, API contracts, or data models that other parts of the system depend on.

User-Facing Workflows — Features with defined user journeys and acceptance criteria that need preservation for future reference.

Cross-Team Dependencies — Any feature that other teams will integrate with, requiring clear contract definitions.

Don’t create specs for: Simple bug fixes, trivial UI changes, configuration updates, or dependency bumps.

Spec granularity

A spec should be detailed enough to serve as a contract for the feature, but not so detailed that it becomes a maintenance burden.

Some spec features, like gherkin scenarios, are not always necessary if the feature is simple or well-understood.

When to Update a Spec

Update an existing spec when:

Golden Rule: If code behavior changes, the spec MUST be updated in the same commit.

File Structure

Organize specs by feature domain, not by sprint or ticket number.

/project-root
├── ARCHITECTURE.md           # Global system rules
├── plans/                    # Feature-level specs
│   ├── user-authentication/
│   │   └── spec.md
│   ├── payment-processing/
│   │   └── spec.md
│   └── notifications/
│       └── spec.md
└── src/                      # Implementation code

Conventions:

Maintenance Protocol

Same-Commit Rule

If code changes behavior, update the spec in the same commit. Add “Spec updated” to your PR checklist.

git commit -m "feat(notifications): add SMS fallback

- Implements SMS delivery when WebSocket fails
- Updates /plans/notifications/spec.md with new transport layer"

Deprecation Over Deletion

Mark outdated sections as deprecated rather than removing them. This preserves historical context.

### Architecture

**[DEPRECATED 2024-12-01]**
~~WebSocket transport via Socket.io library~~
Replaced by native WebSocket API to reduce bundle size.

**Current:**
Native WebSocket connection via `/api/ws/notifications`

Bidirectional Linking

Link code to specs and specs to code:

// Notification delivery must meet 100ms latency requirement
// See: /plans/notifications/spec.md#contract
### Data Schema
Implemented in `src/types/Notification.ts` using Zod validation.

Template

# Feature: [Feature Name]

## Blueprint

### Context
[Why does this feature exist? What business problem does it solve?]

### Architecture
- **API Contracts:** `[METHOD] /api/v1/[endpoint]` - [Description]
- **Data Models:** See `[file path]`
- **Dependencies:** [What this depends on / what depends on this]

### Anti-Patterns
- [What agents must avoid, with rationale]

## Contract

### Definition of Done
- [ ] [Observable success criterion]

### Regression Guardrails
- [Critical invariant that must never break]

### Scenarios
**Scenario: [Name]**
- Given: [Precondition]
- When: [Action]
- Then: [Expected outcome]

Anti-Patterns

The Stale Spec

Problem: Spec created during planning, never updated as the feature evolves.

Solution: Make spec updates mandatory in Definition of Done. Add PR checklist item.

The Spec in Slack

Problem: Design decisions discussed in chat but never committed to the repo.

Solution: After consensus, immediately update spec.md with a commit linking to the discussion.

The Monolithic Spec

Problem: A single 5000-line spec tries to document the entire application.

Solution: Split into feature-domain specs. Use ARCHITECTURE.md only for global cross-cutting concerns.

The Spec-as-Tutorial

Problem: Spec reads like a beginner’s guide, full of basic programming concepts.

Solution: Assume engineering competence. Document constraints and decisions, not general knowledge.

The Copy-Paste Code

Problem: Spec duplicates large chunks of implementation code.

Solution: Reference canonical sources with file paths. Only include minimal examples to illustrate patterns.

See also:

Micro-Commits

Ultra-granular commit practice for agentic workflows, treating version control as reversible save points.

Status: Live | Last Updated: 2026-01-13

Definition

Micro-Commits is the practice of committing code changes at much higher frequency than traditional development workflows. Each discrete task—often a single function, test, or file—receives its own commit.

When working with LLM-generated code, commits become “save points in a game”: Checkpoints that enable instant rollback when probabilistic outputs introduce bugs or architectural drift.

When to Use

Use this practice when:

Skip this practice when:

The Problem: Coarse-Grained Commits in Agentic Workflows

Traditional commit practices optimize for human readability and PR review: “logical units of work” that span multiple files and implement complete features.

This fails in agentic workflows because:

LLM outputs are probabilistic — A model might generate correct code for 3 files and introduce subtle bugs in the 4th. Bundling all 4 files into one commit makes rollback destructive.

Regression to mediocrity — Without checkpoints, it’s difficult to identify where LLM output drifted from the Spec contracts.

Context loss — Large commits obscure the sequence of decisions. When debugging, you need to know “what changed, when, and why.”

No emergency exit — If an LLM generates a tangled mess across 10 files, your only option is manual surgery or discarding hours of work.

The Solution: Commit After Every Task

Make a commit immediately after:

This creates a breadcrumb trail of working states.

The Practice

4.1. Atomic Tasks → Atomic Commits

Break work into small, testable chunks. Each chunk maps to one commit.

Example PBI: “Add OAuth login flow”

Commit sequence:

1. feat: add OAuth config schema
2. feat: implement token exchange endpoint
3. feat: add session storage for OAuth tokens
4. test: add OAuth flow integration test
5. refactor: extract OAuth error handling

This aligns with atomic PBIs: small, bounded execution units.

4.2. Commit Messages as Execution Log

Commit messages document the sequence of LLM-assisted changes. They serve as:

Format:

type(scope): brief description

- Detail 1
- Detail 2

Example:

feat(auth): implement OAuth token validation

- Add JWT verification middleware
- Extract claims from token payload
- Return 401 on expired tokens

4.3. Branches and Worktrees for Isolation

Use branches or git worktrees to isolate LLM experiments:

Branches — Separate experimental work from stable code. Merge only after validation.

Worktrees — Run parallel LLM sessions on the same repository without context conflicts. Each worktree is an independent working directory.

Example workflow:

# Create worktree for LLM experiment
git worktree add ../project-experiment experiment-oauth

# Work in worktree, commit frequently
cd ../project-experiment
# ... LLM generates code ...
git commit -m "feat: add OAuth callback handler"

# If successful, merge into main
git checkout main
git merge experiment-oauth

# If failed, discard worktree
git worktree remove ../project-experiment

This prevents contaminating the main branch with failed LLM output.

4.4. Rollback as First-Class Operation

When LLM output introduces bugs:

Identify the bad commit — Review recent history to find where issues appeared.

Rollback to last known good state:

# Soft reset (keeps changes as uncommitted)
git reset --soft HEAD~1

# Hard reset (discards changes entirely)
git reset --hard HEAD~1

Selective revert:

# Revert specific commit without losing subsequent work
git revert <commit-hash>

This is only safe because micro-commits isolate changes.

5. Tidy History for Comprehension

Granular commits create noisy history. Before merging to main, optionally squash related commits into logical units:

# Interactive rebase to squash last 5 commits
git rebase -i HEAD~5

This preserves detailed history during development while creating clean history for long-term maintenance.

Trade-off: Squashing removes granular rollback points. Only squash after validation passes Quality Gates.

Relationship to The PBI

PBIs define what to build. Micro-Commits define how to track progress.

Atomic PBIs (small, bounded tasks) naturally produce micro-commits. Each PBI generates 1-5 commits depending on complexity.

Example mapping:

This makes PBI progress traceable and reversible.

See also:

PBI Authoring

How to write Product Backlog Items that agents can read, execute, and verify—with templates and lifecycle guidance.

Status: Live | Last Updated: 2026-01-13

Definition

PBI Authoring is the practice of writing Product Backlog Items optimized for agent execution. This includes structuring the four-part anatomy, ensuring machine accessibility, and managing the PBI lifecycle from planning through closure.

Following this practice produces PBIs that agents can programmatically access, unambiguously interpret, and verifiably complete.

When to Use

Use this practice when:

Skip this practice when:

Process

Step 1: Ensure Accessibility

Invisibility is a bug. If an agent cannot read the PBI, the workflow is broken.

A PBI locked inside a web UI without API or MCP integration is useless to an AI developer. The agent must programmatically access the work item without requiring human copy-paste.

Valid access methods:

MethodDescription
MCP IntegrationAgent connected to Issue Tracker (Linear, Jira, GitHub) via MCP
Repo-BasedPBI exists as a markdown file (e.g., tasks/PBI-123.md)
API AccessTracker exposes REST/GraphQL API the agent can query

If your tracker has no API access: Mirror PBIs as markdown files during sprint planning, or implement MCP integration.

Step 2: Write the Directive

State what to do with explicit scope boundaries. Be imperative, not conversational.

Good:

Implement the API Layer for user notification preferences.
Scope: Only touch the `src/api/notifications` folder.

Bad:

As a user, I want to manage my notification preferences so that I can control what emails I receive.

The second example requires interpretation. The first is executable.

[!TIP] Prompt for the Plan. Even if your tool handles planning automatically, explicitly instruct the agent to output its plan for review. This forces the Specify → Plan → Execute loop.

Example Directive: “Analyze the Spec, propose a step-by-step plan including which files you will touch, and wait for my approval before editing files.”

Step 3: Add Context Pointers

Reference the permanent spec—don’t copy design decisions into the PBI.

Reference: `plans/notifications/spec.md` Part A for the schema.
See the "Architecture" section for endpoint contracts.

Why pointers, not copies: Specs evolve. A copied schema in a PBI becomes stale the moment the spec updates. Pointers ensure the agent always reads the authoritative source.

Step 4: Define Verification Criteria

Link to success criteria in the spec, or define inline checkboxes.

Must pass "Scenario 3: Preference Update" defined in 
`plans/notifications/spec.md` Part B (Contract).

Or inline:

- [ ] POST /preferences returns 201 on valid input
- [ ] Invalid payload returns 400 with error schema
- [ ] Unit test coverage > 80%

Step 5: Declare Dependencies

Explicitly state what blocks this PBI and what it blocks.

## Dependencies
- Blocked by: PBI-101 (creates the base schema)
- Must merge before: PBI-103 (extends this endpoint)

Anti-Pattern: Implicit dependencies discovered at merge time. Identify during planning; either sequence the work or refactor into independent units.

Step 6: Set the Refinement Rule

Define what happens when reality diverges from the spec.

If implementation requires changing the Architecture, you MUST 
update `spec.md` in the same PR with a changelog entry.

Options to specify:

Template

# PBI-XXX: [Brief Imperative Title]

## Directive
[What to build/change in 1-2 sentences]

**Scope:**
- [Explicit file/folder boundaries]
- [What NOT to touch]

## Dependencies
- Blocked by: [PBI-YYY if any, or "None"]
- Must merge before: [PBI-ZZZ if any, or "None"]

## Context
Read: `[path/to/spec.md]`
- [Specific section to reference]

## Verification
- [ ] [Criterion 1: Functional requirement]
- [ ] [Criterion 2: Performance/quality requirement]
- [ ] [Criterion 3: Test coverage requirement]

## Refinement Protocol
[What to do if the spec needs updating during implementation]

PBI Lifecycle

PhaseActorAction
PlanningHumanCreates PBI with 4-part structure
AssignmentHuman/SystemPBI assigned to Agent or Developer
ExecutionAgentReads Spec, implements Delta
ReviewHumanVerifies against Spec’s Contract section
MergeHuman/SystemCode merged, Spec updated if needed
ClosureSystemPBI archived, linked to commit/PR

Common Mistakes

The User Story Hangover

Problem: PBI written as “As a user, I want…” with no explicit scope or verification.

Solution: Rewrite in imperative form with scope boundaries and checkable criteria.

The Spec Copy

Problem: PBI contains copied design decisions that drift from the spec.

Solution: Use pointers to spec sections, never copy content that lives elsewhere.

The Hidden Dependency

Problem: Two PBIs touch the same files; discovered at merge time.

Solution: During planning, map file ownership. If overlap exists, sequence the PBIs or refactor scope.

The Untestable Increment

Problem: PBI verification requires another PBI to complete first.

Solution: Ensure each PBI is self-testable. If not possible, merge into a single PBI or create test fixtures.

This practice implements:

See also:

Product Vision Authoring

How to create and maintain a Product Vision document that transmits taste to agents—inline in AGENTS.md or as a separate file.

Status: Draft | Last Updated: 2025-01-05

Overview

This practice guides you through creating a Product Vision that prevents vibe convergence—the tendency of agents to produce generic, forgettable outputs. The goal is a document that transmits product taste effectively, whether inline in AGENTS.md or as a separate VISION.md.

Prerequisites

Before authoring a Product Vision, you should have:

Inline vs Separate File

The first decision: does your vision belong in AGENTS.md or a separate file?

When to Inline in AGENTS.md

Choose inline when:

Inline format:

# AGENTS.md

## Product Vision

We're building a fast, keyboard-first task manager for developers 
who hate project management software. Think Linear meets Raycast.

**We value:** Speed over features. Opinions over options. 
Power users over onboarding wizards.

**We sound like:** Confident, terse, technical. No "Oops!" or "We're excited..."

## Tech Stack
...

This approach keeps vision in the same context load as behavioral rules, ensuring agents always see it.

When to Extract to VISION.md

Extract to a separate file when:

Reference format in AGENTS.md:

# AGENTS.md

## Product Vision
See [VISION.md](./VISION.md) for full product identity, voice, and taste references.

**TL;DR:** Fast, opinionated task manager for developers. Linear meets Raycast.

## Tech Stack
...

The TL;DR ensures agents get core identity even when VISION.md isn’t in context.

Writing Each Component

A complete Product Vision has five components. Not all are required for inline versions—scale to your needs.

1. The Actual Humans

Describe real people, not abstract personas.

Bad:

## Target Users
- Power users
- Enterprise customers
- Developer teams

Good:

## Who We're Building For

Overworked creative directors at 15-person agencies who juggle 
12 clients simultaneously. They've used every tool. They're 
impatient with onboarding because they're not beginners. They 
work late, prefer dark interfaces, and will mass-adopt anything 
that saves them 20 minutes a day.

They hate: Enterprise software that treats them like idiots.
They love: Tools that feel like they were built by people like them.

The difference: agents can use the second version to make judgment calls. “Would this person want a wizard?” has a clear answer.

2. Point of View

State opinions that reasonable people might disagree with.

Bad (generic values):

## Values
- User-centric design
- Quality and reliability
- Innovation

Good (actual opinions):

## Our Point of View

- Dense information over progressive disclosure (our users aren't beginners)
- Keyboard-first, mouse-optional
- Dark mode is the default, not a toggle
- We'd rather be slightly weird than completely forgettable
- Features ship incomplete but useful, not complete but late
- Settings are failure; good defaults are success

Each bullet represents a tradeoff. Agents can use these to resolve ambiguity.

3. Taste References

Name specific products and what to take from them.

## Taste References

**Study these:**
- Linear (density, keyboard navigation, visual restraint)
- Raycast (speed as personality, power-user focus)
- Things 3 (calm, opinionated defaults)
- Stripe's API docs (clarity, developer respect)

**Avoid these patterns:**
- Salesforce (cluttered, corporate, permission-drunk)
- Jira (complexity as feature)
- Any product with a "getting started" carousel
- Dashboards with 15 metrics and no hierarchy

Agents can literally reference these: “Make this feel more like Linear, less like Jira.”

4. Voice and Language

Provide actual examples, not just descriptions.

## Voice

Confident but not arrogant. Clear but not sterile.

**We say:**
- "Nope" (not "Unfortunately, that's not possible at this time")
- "This will delete everything. Sure?" (not "Are you sure you want to proceed?")
- "Saved" (not "Your changes have been successfully saved!")

**Error messages are human:**
- "Can't reach the server. Retrying..." (not "Error code 503")
- "That file's too big. Try under 10MB." (not "Upload failed: maximum file size exceeded")

**We don't say:**
- "We're excited to..." (we're software, we don't have feelings)
- "On your journey" (this is a tool, not a spiritual experience)
- "Oops!" (we're adults)

5. Decision Heuristics

Provide tie-breakers for ambiguous situations.

## When In Doubt

1. Fewer features, better defaults
2. If it needs explanation, redesign it
3. Respect power users; don't punish them with beginner safety rails
4. Fast and slightly wrong beats slow and perfect
5. When torn between "conventional" and "opinionated," choose opinionated

Diagnosing Vision Problems

Signs your vision isn’t working:

SymptomLikely CauseFix
Copy “could belong to any product”Missing or weak Voice sectionAdd specific examples of tone
UI suggestions feel genericMissing Taste ReferencesAdd “study these / avoid these” products
Agents make wrong tradeoffsMissing Point of ViewAdd explicit opinion stances
New team members produce inconsistent workVision not in contextCheck AGENTS.md references VISION.md
You keep correcting “tone” in reviewsVoice section too abstractReplace descriptions with examples

Maintenance

Update Triggers

Review and update the vision when:

Review Cadence

Ownership

Product Vision should have a single owner (product lead, founder, or design lead). Committee-authored visions lose voice consistency.

Integration with Specs

When writing specs, reference the vision for design rationale:

# Feature: Notification Preferences

## Blueprint

### Context
Users need control over notification frequency without 
feeling like they're configuring a mail server.

### Vision Alignment
- Per VISION.md: "Settings are failure; good defaults are success"
- Ship with smart defaults, surface preferences only when users seek them
- No notification preferences wizard on first launch

This creates traceability: when someone asks “why don’t we have granular notification controls?” the answer is documented.

Template

Inline Template (for AGENTS.md)

## Product Vision

[One paragraph: what we're building and for whom]

**We value:** [3-5 tradeoff stances]

**We sound like:** [Tone description with 2-3 examples]

**Reference products:** [2-3 products that "feel right"]

Full Template (for VISION.md)

# Product Vision: [Product Name]

## Who We're Building For
[Describe actual humans, not personas. Context, constraints, 
what they hate, what they wish existed.]

## Our Point of View
- [Opinion about tradeoff]
- [Opinion about tradeoff]
- [What we value over what]

## Taste References

**These feel right:**
- [Product] — [what specifically]

**These feel wrong:**
- [Pattern] — [why]

## Voice

**We sound like:** [Description]

**We say:** [Examples]

**We don't say:** [Anti-examples]

## When In Doubt
1. [Heuristic]
2. [Heuristic]

See also:

Workflow as Code

Define agentic workflows in deterministic code rather than prompts to ensure reliability, type safety, and testable orchestration.

Status: Experimental | Last Updated: 2026-01-18

Definition

Workflow as Code is the practice of defining agentic workflows using deterministic programming languages (like TypeScript or Python) rather than natural language prompts.

It treats the Agent as a function call within a larger, strongly-typed system, rather than treating the System as a tool available to a chatty agent.

When to Use

Use this practice when:

Skip this practice when:

Why It Matters

When complex workflows are driven entirely by an LLM loop (“Here is a goal, figure it out”), the system suffers from Context Pollution. As the agent accumulates history—observations, tool outputs, internal monologue—its attention degrades.

Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side”: it loses focus on strict process adherence because its context window is overflowing with implementation details.

Process

Step 1: Identify Deterministic vs Probabilistic Tasks

Audit your workflow. Separate mechanical tasks (running builds, conditional logic, file operations) from intelligence tasks (code review, summarization, decision-making under ambiguity).

Deterministic (Code):

Probabilistic (Agent):

Step 2: Define Typed Step Abstraction

Create a common interface for workflow steps:

export type WorkflowContext = {
  workDir: string;
  spec: string;
  history: StepResult[];
};

export type StepResult =
  | { type: 'success'; data: unknown }
  | { type: 'failure'; reason: string; recoverable: boolean };

export type Step = (ctx: WorkflowContext) => Promise<StepResult>;

This enables:

Step 3: Implement the Orchestration Shell

Write the control flow in code. The LLM only appears where intelligence is required:

async function runDevWorkflow(ctx: WorkflowContext) {
  // Deterministic: Run build
  const buildResult = await runBuild(ctx);
  if (buildResult.type === 'failure') {
    return handleBuildError(buildResult);
  }

  // Probabilistic: Agent reviews the diff
  const reviewResult = await runAgentReview({
    diff: await git.getDiff(),
    spec: ctx.spec
  });

  // Deterministic: Act on structured result
  if (reviewResult.verdict === 'PASS') {
    await git.commit();
    await github.createPR();
  }
}

Step 4: Implement Opaque Commands

From the agent’s perspective, workflow steps should be “Black Boxes.” The agent invokes a high-level command and acts on the structured result—it doesn’t need to know implementation details.

Define the interface:

type VerifyWorkResult = {
  status: 'passed' | 'failed';
  errors?: { file: string; line: number; message: string }[];
};

async function verifyWork(ctx: WorkflowContext): Promise<VerifyWorkResult> {
  // Implementation hidden from agent
  const lint = await runLint(ctx.workDir);
  const types = await runTypeCheck(ctx.workDir);
  const tests = await runTests(ctx.workDir);
  
  return aggregateResults([lint, types, tests]);
}

This reduces token usage and prevents the agent from hallucinating incorrect shell commands.

Step 5: Add Enforcement Hooks

Agents will sometimes try to bypass the workflow. Implement hard boundaries using client-side hooks:

# .claude/hooks/pre-tool-use.sh
if [[ "$TOOL" == "Bash" && "$COMMAND" =~ "git push" ]]; then
  echo "Blocked: Use 'submit-pr' tool which runs verification first."
  exit 1
fi

This shifts enforcement from “Instructions in the System Prompt” (which can be ignored) to “Code in the Environment” (which cannot).

Template

Minimal workflow orchestrator structure:

// workflows/dev-workflow.ts
import type { Step, WorkflowContext, StepResult } from './types';

const steps: Step[] = [
  runBuild,
  runLint,
  runAgentReview,  // Only probabilistic step
  commitChanges,
  createPR,
];

export async function execute(ctx: WorkflowContext): Promise<StepResult> {
  for (const step of steps) {
    const result = await step(ctx);
    if (result.type === 'failure' && !result.recoverable) {
      return result;
    }
    ctx.history.push(result);
  }
  return { type: 'success', data: ctx.history };
}

Common Mistakes

The God Prompt

Problem: Entire workflow described in a single system prompt, expecting the agent to “figure it out.”

Solution: Extract deterministic logic into code. The agent should only handle tasks requiring intelligence.

Leaky Abstractions

Problem: Agent sees raw command output (500 lines of test failures) instead of structured results.

Solution: Parse outputs into typed results before passing to the agent. Summarize, don’t dump.

Missing Enforcement

Problem: Workflow relies on the agent “following instructions” without hard boundaries.

Solution: Implement hooks that block unauthorized actions. Trust code, not compliance.

Over-Agentification

Problem: Using an LLM to run npm install or parse JSON—tasks with zero ambiguity.

Solution: Reserve agent calls for genuinely probabilistic tasks. Everything else is code.