ASDLC.io
Field Manual

Version: 0.20.0

Generated: 2026-03-09

Table of Contents

Part I: Patterns

Part II: Practices

Appendix: Concepts

Part I: Patterns

Adversarial Code Review

Consensus verification pattern using a secondary Critic Agent to review Builder Agent output against the Spec.

Status: Live | Last Updated: 2026-01-31

Definition

Adversarial Code Review is a verification pattern where a distinct AI session—the Critic Agent—reviews code produced by the Builder Agent against the Spec before human review.

This extends the Critic (Hostile Agent) pattern from the design phase into the implementation phase, creating a verification checkpoint that breaks the “echo chamber” where a model validates its own output.

The Builder Agent (optimized for speed and syntax) generates code. The Critic Agent (optimized for reasoning and logic) attempts to reject it based on spec violations.

The Problem: Self-Validation Ineffectiveness

LLMs are probabilistic text generators trained to be helpful. When asked “Check your work,” a model that just generated code will often:

Hallucinate correctness — Confidently affirm that buggy logic is correct because it matches the plausible pattern in training data.

Double down on errors — Explain why the bug is actually a feature, reinforcing the original mistake.

Share context blindness — Miss gaps because it operates within the same context window and reasoning path that produced the original output.

If the same computational session writes and reviews code, the “review” provides minimal independent validation.

The Solution: Separated Roles (and Parallel Critique)

To create effective verification, separate the generation and critique roles. Advanced implementations also utilize parallel multi-model critique to find overlapping issues before synthesizing the results.

The Builder — Optimizes for implementation throughput (e.g., Gemini 3 Flash, Claude Haiku 4.5). Generates code from the PBI and Spec.

The Critic Lanes — A set of independent models (e.g., an illustrative “Tri-Model Lane” approach with independent Architect, SecOps, and QA personas) optimized for specific validation dimensions. Models must have strict Provenance identity separation so their actions are audited independently.

The Critics do not generate alternative implementations. They act as gatekeepers, producing either PASS or a list of spec violations that must be addressed.

The Workflow

1. Build Phase

The Builder Agent implements the PBI according to the Spec.

Output: Code changes, implementation notes.

Example: “Updated auth.ts to support OAuth login flow.”

2. Context Swap (Fresh Eyes)

Critical: Start a new AI session or chat thread for critique. This clears conversation drift and forces the Critic to evaluate only the artifacts (Spec + Diff), not the Builder’s reasoning process.

If using the same model, close the current chat and open a fresh session. If using Model Routing, switch to High Reasoning models for parallel critique.

3. Critique Phase

Feed the Spec and the code diff to the Critic Agents with adversarial framing. Advanced factories run these in parallel lanes using specialized prompts (for example, the Architect persona below):

System Prompt (Architect Critic Example):

You are a rigorous Code Reviewer validating implementation against contracts.

Input:
- Spec: specs/auth-system.md
- Code Changes: src/auth.ts (diff)

Task:
Compare the code strictly against the Spec's Blueprint (constraints) and Contract (quality criteria).

Identify:
1. Spec violations (missing requirements, violated constraints)
2. Security issues (injection vulnerabilities, auth bypasses)
3. Edge cases not handled (error paths, race conditions)
4. Anti-patterns explicitly forbidden in the Spec

Output Format:
- PASS (if no violations)
- For each violation, provide:
  1. Violation Description (what contract was broken)
  2. Impact Analysis (why this matters: performance, security, maintainability)
  3. Remediation Path (ordered list of fixes, prefer standard patterns, escalate if needed)
  4. Test Requirements (what tests would prevent regression)

This transforms critique from "reject" to "here's how to fix it."

3b. Moderator Synthesis (For Parallel Critique)

When a pattern incorporates multiple parallel Critics, a Moderator role becomes an architectural requirement to prevent alert fatigue and conflicting directives.

The essential shape of this architecture structurally separates the read-only analysis (performed by the parallel Critics) from the synthesis and write actions (performed exclusively by the Moderator). The Moderator acts as a deduplication and prioritization layer, ensuring the Builder agent receives a single, unified checklist of violations rather than a barrage of uncoordinated feedback.

4. Verdict

If PASS (or resolved by Synthesis): Code moves to human Acceptance Gate (L3 review for strategic fit).

If FAIL: Violations are fed back to Builder as a new task: “Address these spec violations before proceeding.”

This creates a Context Gate between code generation and human review.

Relationship to Context Gates

Adversarial Code Review implements a Review Gate as defined in Context Gates:

Quality Gates (deterministic) — Verify syntax, compilation, linting, test passage.

Review Gates (probabilistic, adversarial) — Verify semantic correctness, spec compliance, architectural consistency. This is where Adversarial Code Review operates.

Acceptance Gates (subjective, HITL) — Verify strategic fit and product vision alignment.

The Critic sits between automated tooling and human review, catching issues that compilers miss but that don’t require human strategic judgment.

Integration with Model Routing

Use Model Routing to assign models by capability profile:

RoleModel ProfileRationale
BuilderHigh ThroughputFast code generation with strong syntax knowledge
Critic(s)High ReasoningDeep logic evaluation, constraint satisfaction, edge case discovery. Can be broken into specialized lanes (Arch, QA, SecOps).

This leverages the strengths of each model class: speed for generation, reasoning depth for validation.

Strategic Value

Reduces L3 Cognitive Load — Human reviewers focus on “Is this the right product?” rather than catching spec deviations or missing error handling.

Catches Regression to Mediocrity — Coding models gravitate toward average solutions. The Critic enforces novelty and architectural intent from the Spec.

Enforces Spec Quality — If the Critic can’t determine whether code is correct, the Spec is ambiguous. This surfaces specification gaps.

Prevents Silent Failures — The Critic catches implementation shortcuts (skipped validation, missing edge cases) that pass tests but violate contracts.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation validated this pattern’s effectiveness:

Context: A user story required filtering audit logs by date range. The Builder Agent implemented the requirement, tests passed, and the code compiled without errors.

Issue Detected: The Critic Agent identified a silent performance violation:

// Implementation passed all Quality Gates but violated architectural constraint
var logs = await repository.LoadAll(); // Loads entire table into memory
return logs.Where(log => log.Date > startDate); // Filters in-memory

Critic Output:

VIOLATION: Performance - Data Access Pattern

Spec requires database-level filtering for datasets exceeding 1k records.
Implementation loads full table then filters in-memory.

Impact: Works with small data; fails at scale (10k+ records cause memory issues)

Remediation Path:
1. Push filtering to database: repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this filter, use raw SQL
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in the repository interface

Key Learnings:

  1. Silent Performance Risks — Code that passes all tests can still violate architectural constraints. The Critic caught the LoadAll().Filter() anti-pattern before production.

  2. Iterative Refinement — The Critic initially flagged “missing E2E tests,” which were actually present but structured differently. The team updated the Critic’s instructions to recognize the project’s test architecture, demonstrating the pattern’s adaptability.

  3. Tone Calibration — Using “Approve with suggestions” framing prevented blocking valid code while surfacing genuine risks. The Critic didn’t reject the PR—it flagged optimization opportunities with clear remediation paths.

This validates the pattern’s core thesis: adversarial review catches architectural violations that pass deterministic checks but violate semantic contracts.

Example: The Silent Performance Bug

Spec Contract: “All database retries must use exponential backoff to prevent thundering herd during outages.”

Builder Output: Clean code with a simple retry loop using fixed 1-second delays. Tests pass.

// src/db.ts
async function queryWithRetry(sql: string) {
  for (let i = 0; i < 5; i++) {
    try {
      return await db.query(sql);
    } catch (err) {
      await sleep(1000); // Fixed delay
    }
  }
}

Critic Response:

VIOLATION: src/db.ts Line 45

Spec requires exponential backoff. Implementation uses constant sleep(1000).

Impact: During database outages, this will cause thundering herd problems
as all clients retry simultaneously.

Required: Implement delay = baseDelay * (2 ** attemptNumber)

Without the Critic, a human skimming the PR might miss the constant delay. The automated tests wouldn’t catch it (the code works). The Critic, reading against the contract, identifies the violation.

Implementation Constraints

Not Automated (Yet) — As of December 2025, this requires manual orchestration. Engineers must manually switch sessions/models and feed context to the Critic.

Context Window Limits — Large diffs may exceed even Massive Context models. Use Context Gates filtering to provide only changed files + relevant Spec sections.

Critic Needs Clear Contracts — The Critic can only enforce what’s documented in the Spec. Vague specs produce vague critiques.

Model Capability Variance — Not all “reasoning” models perform equally at code review. Validate your model’s performance on representative examples.

Relationship to Agent Constitution

The Agent Constitution defines behavioral directives for agents. For Adversarial Code Review:

Builder Constitution: “Implement the Spec’s contracts. Prioritize clarity and correctness over cleverness.”

Critic Constitution: “You are skeptical. Your job is to reject code that violates the Spec, even if it ‘works.’ Favor false positives over false negatives.”

This frames the Critic’s role as adversarial by design—it’s explicitly told to be rigorous and skeptical, counterbalancing the Builder’s helpfulness bias.

Future Automation Potential

This pattern is currently manual but has clear automation paths:

CI/CD Integration — Run Critic automatically on PR creation, posting violations as review comments.

IDE Integration — Real-time critique as code is written, similar to linting but spec-aware.

Multi-Agent Orchestration — Automated handoff between Builder and Critic until PASS is achieved.

Programmatic Orchestration (Workflow as Code)

To scale this pattern, move from manual prompt-pasting to code-based orchestration (e.g., using the Claude Code SDK).

Convention-Based Loading: Store reviewer agent prompts in a standard directory (e.g., .claude/agents/) and load them dynamically:

// Load the specific reviewer agent
const reviewerPrompt = await fs.readFile(`.claude/agents/${agentName}.md`);

// Spawn subagent via SDK
const reviewResult = await claude.query({
  prompt: reviewerPrompt,
  context: { spec, diff },
  outputFormat: { type: 'json_schema', schema: ReviewSchema }
});

This allows you to treat Critic Agents as standardized, version-controlled functions in your build pipeline.

As agent orchestration tooling matures, this pattern may move from Experimental to Standard.

See also:

Agent Constitution

Persistent, high-level directives that shape agent behavior and decision-making before action.

Status: Live | Last Updated: 2026-02-18

Definition

An Agent Constitution is a set of high-level principles or “Prime Directives” injected into an agent’s system prompt to align its intent and behavior with system goals.

The concept originates from Anthropic’s Constitutional AI research, which proposed training models to be “Helpful, Honest, and Harmless” (HHH) using a written constitution rather than human labels alone. In the ASDLC, we adapt this alignment technique to System Prompt Engineering—using the Constitution to define the “Superego” of our coding agents.

The Problem: Infinite Flexibility

Without a Constitution, an Agent is purely probabilistic. It will optimize for being “helpful” to the immediate prompt user, often sacrificing long-term system integrity.

If a prompt says “Implement this fast,” a helpful agent might skip tests. A Constitutional Agent would refuse: “I cannot skip tests because Principle #3 forbids merging unverified code.”

The Solution: Proactive Behavioral Alignment

The Constitution shapes agent behavior before action occurs—unlike reactive mechanisms (tests, gates) that catch problems after the fact.

The Driver Training Analogy

To understand the difference between a Constitution and other control mechanisms, consider the analogy of driving a car:

The “Orient” Phase

In the OODA Loop (Observe-Orient-Decide-Act), the Constitution lives squarely in the Orient phase.

When an agent Observes the world (reads code, sees a user request), the Constitution acts as a filter for how it interprets those observations.

Taxonomy: Steering vs. Deterministic Constraints

It is critical to distinguish what the Constitution can enforce (Steering) from what external systems enforce deterministically (Hard). Hard constraints split into two distinct categories:

1. Steering Constraints (Probabilistic)

Live in the system prompt / agents.md. Influence the model’s reasoning, tone, and risk preference. The agent self-polices these — they are probabilistic, not guaranteed.

2. Toolchain Constraints (Deterministic — Repo)

Live in tool configuration files (biome.json, tsconfig, .golangci.yml, ESLint, etc.). Enforced by the toolchain on every run, regardless of agent behavior. The tool is the enforcement mechanism — not the agent.

Restating Toolchain Constraints in agents.md is an antipattern. It implies the agent is the enforcement mechanism when it is not, and research shows agents will follow these instructions faithfully — adding reasoning cost and broader exploration without improving outcomes (Gloaguen et al., 2026).

3. Orchestration Constraints (Deterministic — Runtime)

Live in the runtime environment (hooks, CI pipelines, Docker containers, API limits). Physically prevent the agent from taking restricted actions.

The Decision Rule

Before adding any rule to agents.md, ask: can a tool or runtime already enforce this?

Can a linter/formatter enforce it?  → put it in tool config, not agents.md
Can a CI gate enforce it?           → put it in the pipeline, not agents.md
Can a hook enforce it?              → put it in the hook, not agents.md
None of the above?                  → agents.md is the right home

The Constitution is for the judgment layer — the things that require reasoning to uphold. Everything else has a more reliable home.

Anatomy of a Constitution

Research into effective system prompts suggests a constitution should have four distinct components:

1. Identity (The Persona)

Who is the agent? This prunes the search space of the model (e.g., “You are a Senior Rust Engineer” vs “You are a poetic assistant”).

2. The Mission (Objectives)

What is the agent trying to achieve?

3. The Boundaries (Negative Constraints)

What must the agent never do? These are “Soft Gates”—instructions to avoid bad paths before hitting the hard Context Gates.

4. The Process (Step-by-Step)

How should the agent think? This enforces Chain-of-Thought reasoning.

Constitution vs. Spec

A common failure mode is mixing functional requirements with behavioral guidelines. Separation is critical:

FeatureAgent ConstitutionThe Spec
ScopeGlobal / Persona-wideLocal / Task-specific
LifespanPersistent (Project Lifecycle)Ephemeral (Feature Lifecycle)
ContentValues, Style, Ethics, SafetyLogic, Data Structures, Routes
Example”Prioritize Type Safety over Brevity.""User id must be a UUID.”

Self-Correction Loop

One of the most powerful applications of a Constitution is the Critique-and-Refine loop (derived from Anthropic’s Supervised Learning phase):

  1. Draft: Agent generates a response to the user’s task.
  2. Critique: Agent (or a separate Critic agent) compares the draft against the Constitution.
  3. Refine: Agent rewrites the draft to address the critique.

This allows the agent to fix violations (e.g., “I used any type, but the Constitution forbids it”) before the user ever sees the code.

Periodic Auditing

As the toolchain evolves (dependency upgrades, new linter rules, stricter tsconfig), previously necessary Constitution rules may become redundant. Auditing agents.md for toolchain-redundant rules should be part of dependency upgrade reviews.

Relationship to Other Patterns

Constitutional Review — The pattern for using a Critic agent to review code specifically against the Agent Constitution.

Context Gates — The deterministic checks that back up the probabilistic Constitution. Hard Constraints implemented via orchestration.

Adversarial Code Review — Uses persona-specific Constitutions (Builder vs Critic) to create dialectic review processes.

The Spec — Defines task-specific requirements, while the Constitution defines global behavioral guidelines.

AGENTS.md Specification — The practice for documenting and maintaining your Agent Constitution.

Workflow as Code — Implements Hard Constraints programmatically, complementing the Constitution’s Steering Constraints.

Constitutional Review

Verification pattern that validates implementation against both functional requirements (Spec) and architectural values (Constitution).

Status: Live | Last Updated: 2026-01-31

Definition

Constitutional Review is a verification pattern that validates code against two distinct contracts:

  1. The Spec (functional requirements) — Does it do what was asked?
  2. The Constitution (architectural values) — Does it do it the right way?

This pattern extends Adversarial Code Review by adding a second validation layer. Code can pass all tests and satisfy the Spec’s functional requirements while still violating the project’s architectural principles documented in the Agent Constitution.

The Problem: Technically Correct But Architecturally Wrong

Standard verification catches functional bugs:

But code can pass all these checks and still violate architectural constraints:

Example: The Performance Violation

// Spec requirement: "Filter audit logs by date range"
async function getAuditLogs(startDate: Date) {
  const logs = await db.auditLogs.findAll(); // ❌ Loads entire table
  return logs.filter(log => log.date > startDate); // ❌ Filters in memory
}

Quality Gates: ✅ Tests pass (small dataset)
Spec Compliance: ✅ Returns filtered logs
Constitutional Review: ❌ Violates “push filtering to database layer”

The code is functionally correct but architecturally unsound. It works fine with 100 records but fails catastrophically at 10,000+.

The Solution: Dual-Contract Validation

Constitutional Review solves this by validating against two sources of truth:

Traditional Review (Functional)

Constitutional Review (Architectural)

The Critic Agent validates against BOTH contracts:

  1. Functional correctness (from the Spec)
  2. Architectural consistency (from the Constitution)

Anatomy

Constitutional Review consists of three key components:

The Dual-Contract Input

Spec Contract — Defines functional requirements, API contracts, and data schemas. Answers “what should it do?”

Constitution Contract — Defines architectural patterns, performance constraints, and security rules. Answers “how should it work?”

Both contracts are fed to the Critic Agent for validation.

The Critic Agent

A secondary AI session (ideally using a reasoning-optimized model) that:

This extends the Adversarial Code Review Critic with constitutional awareness.

The Violation Report

When constitutional violations are detected, the Critic produces:

  1. Violation Description — What constitutional principle was violated
  2. Impact Analysis — Why this matters at scale (performance, security, maintainability)
  3. Remediation Path — Ordered steps to fix (prefer standard patterns, escalate if needed)
  4. Test Requirements — What tests would prevent regression

This transforms review from rejection to guidance.

Relationship to Other Patterns

Adversarial Code Review — The base pattern that Constitutional Review extends. Adds the Constitution as a second validation contract.

Agent Constitution — The source of architectural truth. Defines the “driver training” that shapes initial behavior; Constitutional Review verifies the training was followed.

The Spec — The source of functional truth. Constitutional Review validates against both Spec and Constitution.

Context Gates — Constitutional Review implements a specialized Review Gate that validates architectural consistency.

Feedback Loop: Constitution shapes behavior → Constitutional Review catches violations → Violations inform Constitution updates (if principles aren’t clear enough).

Integration with Context Gates

Constitutional Review implements a specialized Review Gate that sits between Quality Gates and Acceptance Gates:

Gate TypeQuestionValidated By
Quality GatesDoes it compile and pass tests?Toolchain (deterministic)
Spec Review GateDoes it implement requirements?Critic Agent (probabilistic)
Constitutional Review GateDoes it follow principles?Critic Agent (probabilistic)
Acceptance GateIs it the right solution?Human (subjective)

The Constitutional Review Gate catches architectural violations that pass functional verification.

Strategic Value

Catches “Regression to Mediocrity” — LLMs are trained on average code from the internet. Without constitutional constraints, they gravitate toward common but suboptimal patterns.

Enforces Institutional Knowledge — Architectural decisions (performance patterns, security rules, error handling strategies) are documented once in the Constitution and verified on every implementation.

Surfaces Specification Gaps — If the Critic can’t determine whether code violates constitutional principles, the Constitution needs clarification. This improves the entire system.

Reduces L3 Review Burden — Human reviewers focus on strategic fit (“Is this the right feature?”) rather than catching architectural violations (“Why are you loading the entire table?”).

Prevents Silent Failures — Code that “works” but violates architectural principles (like the LoadAll().Filter() anti-pattern) is caught before production.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation caught a constitutional violation that passed all other gates:

Context: User story required filtering audit logs by date range. Builder Agent implemented the requirement, tests passed, code compiled without errors.

Code Behavior:

Gate Results:

Critic Output: Provided specific remediation path:

  1. Push filter to database query layer
  2. If ORM doesn’t support pattern, use raw SQL
  3. Add performance test with 10k+ records
  4. Document constraint in repository interface

Impact: Silent performance bug caught before production. The code worked perfectly in development (small dataset) but would have failed catastrophically at scale.

See full case study in Adversarial Code Review.

Implementing Practice

For step-by-step implementation guidance, see:

See also:

Experience Modeling

The practice of treating the Design System as a formal schema that agents must strictly follow, preventing UI hallucinations.

Status: Live | Last Updated: 2026-01-25

Definition

Experience Modeling is the creation of a queryable Experience Schema—a rigid Design System that serves as the source of truth for all frontend generation.

Just as we model data schemas (SQL/Prisma) to constrain backend agents, Experience Modeling restricts frontend agents to a validated set of UI components, tokens, and layouts. It treats the Design System not as a library of suggestions, but as a strict contract.

The Problem: Design Drift

Without a formal Experience Model, agents suffer from Design Drift—the gradual divergence of a product’s UI from its intended design specifications.

This occurs because LLMs are probabilistic “vibe engines.” When asked to “make a blue button,” an agent might:

Over hundreds of commits, these micro-inconsistencies accumulate into a codebase that is technically functional but visually chaotic and impossible to maintain.

The Solution: The Experience Schema

The solution is to formalize the UI as an Experience Schema—a strict, machine-readable definition of valid UI states.

Instead of asking the agent to “design a page,” we force it to “assemble a page using only these approved blocks.” This shifts the agent’s role from Artist (creating new styles) to Builder (assembling pre-built parts).

Anatomy

1. The Component Catalog (The Vocabulary)

The “words” the agent is allowed to use. This is a set of dumb, stateless UI components (Buttons, Inputs, Cards) that strictly enforce brand styles. These components must be:

2. The Context Gate (The Enforcer)

A mechanical barrier between Experience Modeling and Feature Assembly.

%% caption: Context Gating for Design System Integrity
flowchart LR
  A[[...]] --> |CONTEXT| C
  C[EXPERIENCE MODELING] --> D
  D{GATE} --> E
  E[FEATURE ASSEMBLY]
    E --> |DEFECT/REQUIREMENT SIGNAL| C
    E --> |RELEASE| G
  G[[...]]
Context Gating for Design System Integrity
Context Gating for Design System Integrity

The gate verifies:

  1. Token Strictness: No raw CSS values (hex codes, magic numbers).
  2. Schema Parity: Documentation matches code.
  3. Build Success: The Design System builds in isolation.

3. Read-Only Enforcement (The Governance)

During Feature Assembly, the Experience Model must be Read-Only. Agents cannot modify the definition of a “Button” to make a feature work; they must use the Button as it exists or request a change to the model.

Pattern A: Hard Isolation (Enterprise) The Design System is a separate package (NPM/NuGet) installed as a dependency. The agent literally cannot modify source files because they are in node_modules.

Pattern B: Toolchain Enforcement (Startups) The Design System lives in the same repo, but pre-commit hooks or CODEOWNERS files prevent the agent from modifying src/design-system/** without explicit human override.

Relationship to Other Patterns

Context Gates — Experience Modeling implements a specific type of Context Gate: the “Design Integrity Gate.”

Feature Assembly — The phase that consumes the Experience Model. Feature Agents assume the Experience Model is immutable context.

Agent Personas — We often use a specific “Systems Architect” or “Designer” persona for the Experience Modeling phase, distinct from the “Feature Developer” persona.

Model Routing

Strategic assignment of LLM models to SDLC phases based on reasoning capability versus execution speed.

Status: Live | Last Updated: 2026-01-31

Definition

Model Routing is the strategic assignment of different Large Language Models (LLMs) to different phases or tasks based on their capability profile.

In a monolithic architecture, a user asking for a simple boolean definition incurs the same high cost and latency as a user requesting a complex strategic analysis. Model routing rationalizes this by shifting model selection from a design-time decision to a runtime optimization problem.

The Iron Triangle

Effective routing systems operate by manipulating the trade-offs between three competing constraints:

  1. Quality: Semantic accuracy, reasoning depth, instruction following.
  2. Cost: Operational expenditure (OpEx) per token.
  3. Latency: Time-To-First-Token (TTFT) and total generation time.

By dynamically swapping models, routers decouple these variables. A system can achieve “frontier-class” average quality at “efficient-class” average cost by routing only the most difficult 10-20% of queries to the expensive model.

Taxonomy of Routing Architectures

We identify five primary patterns for implementing model routing:

1. Semantic Routing (Embedding-Based)

Uses vector similarity to map broad intents to specific routes.

2. Predictive Routing (Classifier-Based)

Uses a trained classifier (Bert, XGBoost, or Matrix Factorization like RouteLLM) to predict the probability that a weak model can successfully answer the query.

3. Cascading Routing (Waterfall)

A “fail-up” pattern that prioritizes cost.

4. Probabilistic Routing (Contextual Bandits)

Uses Reinforcement Learning to adapt routing weights based on user feedback or judge evaluation.

5. Agentic Routing (Tool Use)

Structural routing where a dispatcher agent utilizes tools to delegate work.

Anatomy

A complete routing system consists of three components:

1. The Model Registry

A configuration defining the available models and their capabilities.

2. The Router (Gateway vs. Application)

3. The Calibration

The specific thresholds or weights used to make decisions. These must be tuned against a “Preference Dataset” (pairs of queries and optimal model choices).

Operational Economics

The Sweet Spot

LLMs excel at:

Use deterministic code for:

Anti-Patterns

The Monolith

Description: Reliance on a single “Frontier” model for all tasks. Consequence: Excessive cost and latency for simple tasks; inability to scale.

Silent Drift

Description: Hard-coded routing rules (e.g., “if length > 50”) that degrade as user behavior changes. Consequence: Routing becomes incorrectly optimized, sending hard queries to weak models. Fix: Use probabilistic routing or periodic recalibration.

Context Stuffing

Description: Overloading a single prompt with instructions instead of routing to specialized tools/agents. Consequence: “Lost in the Middle” phenomenon; higher hallucination rates.

Trade-offs

DimensionImplications
Latency OverheadThe router itself adds latency (20-50ms for embeddings, 200ms+ for LLM routers). If the weak model saves 300ms but the router takes 400ms, you have negative ROI.
ComplexityMaintaining a router adds a control plane that can fail. It requires monitoring and dataset maintenance.
ConsistencyUsing multiple models can lead to inconsistent “tone” or formatting across a user session.

Relationship to Levels of Autonomy

Levels of Autonomy define human oversight requirements. Model Routing matches computational capability to task characteristics:

Applied in:

Product Vision

A structured vision document that transmits product taste and point-of-view to agents, preventing convergence toward generic outputs.

Status: Live | Last Updated: 2026-01-13

Definition

A Product Vision is a structured artifact that captures the taste, personality, and point-of-view that makes a product this product rather than generic software. It transmits product intuition to agents who otherwise default to bland, safe, interchangeable outputs.

Traditional vision documents are written for humans—investors, executives, new hires. In ASDLC, the Product Vision is structured for agent consumption, providing the context needed to make opinionated decisions aligned with product identity.

The Problem: Vibe Convergence

Agents trained on the entire internet converge toward the mean. Ask for a landing page, you get the same hero section everyone else gets. Ask for onboarding, you get the same three-step wizard. Ask for error copy, you get “Oops! Something went wrong.”

This isn’t a bug in the model. It’s the model doing exactly what it’s trained to do: produce the statistically average response. The average is safe. The average is forgettable.

The symptoms:

The Agent Constitution tells agents how to behave. The Spec tells agents what to build. Neither tells agents who we are.

The Solution: Structured Taste Transmission

The Product Vision bridges this gap by making product identity explicit and agent-consumable. Rather than hoping agents infer taste from scattered references, the vision provides a structured context packet that shapes output quality.

The key insight: agents don’t need complete documentation—they need curated opinions. A Product Vision isn’t comprehensive; it’s opinionated. It tells agents which tradeoffs to make when specs are ambiguous.

Anatomy

A Product Vision consists of five components, each serving a distinct purpose in shaping agent output.

1. The Actual Humans

Not “users” or “customers”—real people with context, constraints, and taste of their own. This gives agents a person to design for, not an abstraction.

When choosing between “simple onboarding wizard” and “power-user defaults with optional setup,” agents need basis for judgment. Abstract personas don’t provide this; descriptions of actual humans do.

2. Point of View

Opinions. Actual stances on tradeoffs that reasonable people might disagree with.

These aren’t requirements—they’re taste. They tell agents which direction to lean when specs are ambiguous:

3. Taste References

Concrete examples of products that feel right, and products that don’t. Agents can reference these patterns directly: “Make this feel more like Linear’s approach to lists, less like Jira’s.”

References provide calibration. Instead of describing “clean” in abstract terms, point to products that embody it—and products that don’t.

4. Voice and Language

How the product speaks. Not brand guidelines—actual examples of tone.

This includes:

5. Decision Heuristics

When agents face ambiguous choices, what should they optimize for? These are tie-breakers—the rules that resolve conflicts between equally valid approaches.

Placement in Context Hierarchy

Product Vision sits between the Constitution and the Specs:

TierArtifactPurpose
ConstitutionAGENTS.mdHow agents behave (rules, constraints)
VisionVISION.md or inlineWho the product is (taste, voice, POV)
Specs/plans/*.mdWhat to build (contracts, criteria)
Reference/docs/Full documentation, API specs, guides

The Constitution shapes behavior. The Vision shapes judgment. The Specs shape output.

Not every project needs a separate VISION.md. For smaller products or early-stage teams, the vision can live as a preamble in AGENTS.md. For complex products with detailed voice guidelines and taste references, a separate file prevents crowding out operational context.

See Product Vision Authoring for guidance on the inline vs. separate decision, templates, and maintenance practices.

Validated in Practice

Industry Validation

Marty Cagan (Silicon Valley Product Group) In the AI era, Cagan argues that product vision is more critical than ever. As AI lowers the cost of building features, differentiation shifts from “ability to ship” to “ability to solve value risks.” Without a strong vision, AI teams build “features that work” rather than “products that matter.”

“It will be easier to build features, but harder to build the right features.” — Marty Cagan

Lenny Rachitsky (Product Sense) Rachitsky defines “product sense” as the ability to consistently craft products with intended impact. VISION.md is essentially codified product sense—explicitly documenting the intuition that senior PMs use to steer teams, so that agents (who lack intuition) can simulate it.

The Scientific Basis: Countering Regression to the Mean

LLMs are probabilistic engines trained to predict the most likely next token. By definition, “most likely” means “most average.”

Without external constraint, an agent will always drift toward the Regression to the Mean. A Product Vision acts as a forcing function, artificially skewing the probability distribution toward specific, non-average choices (e.g., “playful” over “professional,” “dense” over “simple”).

Anti-Patterns

The Generic Vision

“User-centric design. Quality and reliability. Innovation and creativity.”

This says nothing. Every company claims these values. A Product Vision without opinions is just corporate filler that agents will (correctly) ignore.

The Aspirational Vision

Describing the product you wish you had, not the product you’re building. If your vision says “minimal and focused” but your product has 47 settings screens, agents will be confused by the contradiction.

The Ignored Vision

Creating the document once and never referencing it in specs or prompts. The artifact exists but agents never see it in context.

The Aesthetic-Only Vision

All visual preferences, no product opinion. “We like blue and sans-serif fonts” isn’t vision—it’s a style guide. Vision captures judgment, not just appearance.

Relationship to Other Patterns

Agent Constitution — The Constitution defines behavioral rules (what agents must/must not do). The Vision defines taste (what agents should prefer when rules don’t dictate). Constitution is constraints; Vision is guidance.

The Spec — Specs define feature contracts. The Vision influences how those contracts are fulfilled. Specs reference Vision for design rationale: “Per VISION.md: ‘Settings are failure; good defaults are success.’”

Context Engineering — The Vision is a structured context asset. It follows Context Engineering principles: curated, opinionated, agent-optimized.

Product Vision Authoring — Step-by-step guide for creating and maintaining a Product Vision, including templates, inline vs. separate file decisions, and diagnostic guidance.

AGENTS.md Specification — Defines the file format for agent constitutions, including how to incorporate vision as a preamble or reference.

Living Specs — Specs can reference vision for design rationale. The “same-commit rule” applies: if vision changes, affected specs should acknowledge the shift.

Agent Personas — Different personas may need different vision depth. A copywriting agent needs full voice guidance; a database migration agent needs minimal product context.

See also:

Ralph Loop

Persistence pattern enabling autonomous agent iteration until external verification passes, treating failure as feedback rather than termination.

Status: Live | Last Updated: 2026-02-21

Definition

The Ralph Loop—named by Geoffrey Huntley after the persistently confused but undeterred Simpsons character Ralph Wiggum—is a persistence pattern that turns AI coding agents into autonomous, self-correcting workers.

The pattern operationalizes the OODA Loop for terminal-based agents and automates the Learning Loop with machine-verifiable completion criteria. It enables sustained L3-L4 autonomy—“AFK coding” where the developer initiates and returns to find committed changes.

flowchart LR
    subgraph Input
        PBI["PBI / Spec"]
    end
    
    subgraph "Human-in-the-Loop (L1-L2)"
        DEV["Dev + Copilot"]
        E2E["E2E Tests"]
        DEV --> E2E
    end
    
    subgraph "Ralph Loop (L3-L4)"
        AGENT["Agent Iteration"]
        VERIFY["External Verification"]
        AGENT --> VERIFY
        VERIFY -->|"Fail"| AGENT
    end
    
    subgraph Output
        REVIEW["Adversarial Review"]
        MERGE["Merge"]
        REVIEW --> MERGE
    end
    
    PBI --> DEV
    PBI --> AGENT
    E2E --> REVIEW
    VERIFY -->|"Pass"| REVIEW
Mermaid Diagram

Both lanes start from the same well-structured PBI/Spec and converge at Adversarial Review. The Ralph Loop lane operates autonomously, with human oversight at review boundaries rather than every iteration.

[!WARNING] The “100 Million Lines” Anti-Pattern

Ralph Loop enables persistence, not quality. Using Ralph Loop for unbounded code generation without specs produces what Dan Cripe calls “100 million lines of crappy code”—technically functional but architecturally incoherent and unmaintainable.

Ralph Loop is a persistence mechanism, not a development methodology. It must be bounded by:

  • Exit criteria defined in The Spec
  • Verification gates that check architectural coherence, not just compilation
  • Scope limits that prevent unbounded iteration

The Problem: Human-in-the-Loop Bottleneck

Traditional AI-assisted development creates a productivity ceiling: the human reviews every output before proceeding. This makes the human the slow component in an otherwise high-speed system.

The naive solution—trusting the agent’s self-assessment—fails because LLMs confidently approve their own broken code. Research demonstrates that self-correction is only reliable with objective external feedback. Without it, the agent becomes a “mimicry engine” that hallucinates success.

AspectTraditional AI InteractionFailure Mode
Execution ModelSingle-pass (one-shot)Limited by human availability
Failure ResponseProcess termination or manual re-promptBlocks on human attention
VerificationHuman review of every outputHuman becomes bottleneck

The Solution: External Verification Loop

The Ralph Loop inverts the quality control model: instead of treating LLM failures as terminal states requiring human intervention, it engineers failure as diagnostic data. The agent iterates until external verification (not self-assessment) confirms success.

Core insight: Define the “finish line” through machine-verifiable tests, then let the agent iterate toward that finish line autonomously. Iteration beats perfection.

AspectTraditional AIRalph Loop
Execution ModelSingle-passContinuous multi-cycle
Failure ResponseManual re-promptAutomatic feedback injection
Persistence LayerContext windowFile system + Git history
VerificationHuman reviewExternal tooling (Docker, Jest, tsc)
ObjectiveImmediate correctnessEventual convergence

Anatomy

1. Stop Hooks and Exit Interception

The agent attempts to exit when it believes it’s done. A Stop hook intercepts the exit and evaluates current state against success criteria. If the agent hasn’t produced a specific “completion promise” (e.g., <promise>DONE</promise>), the hook blocks exit and re-injects the original prompt.

This creates a self-referential loop: the agent confronts its previous work, analyzes why the task remains incomplete, and attempts a new approach.

2. External Verification (Generator/Judge Separation)

The agent is not considered finished when it believes it’s done—only when external verification confirms success:

Evaluation TypeAgent LogicExternal Tooling
Self-Assessment”I believe this is correct”None (Subjective)
External Verification”I will run docker build”Docker Engine (Objective)
Exit DecisionLLM decides to stopSystem stops because tests pass

This is the architectural enforcement of Generator/Judge separation from Adversarial Code Review, but mechanized.

3. Git as Persistent Memory

Context windows rot, but Git history persists. Each iteration commits changes, so subsequent iterations “see” modifications from previous attempts. The codebase becomes the source of truth, not the conversation.

Git also enables easy rollback if an iteration degrades quality.

4. Context Rotation and Progress Files

Context rot: Accumulation of error logs and irrelevant history degrades LLM reasoning.

Solution: At 60-80% context capacity, trigger forced rotation to fresh context. Essential state carries over via structured progress files:

This is the functional equivalent of free() for LLM memory—applied Context Engineering.

5. Convergence Through Iteration

The probability of successful completion P(C) is a function of iterations n:

P(C) = 1 - (1 - p_success)^n

As n increases (often up to 50 iterations), probability of handling complex bugs approaches 1.

6. Map-Reduce (Initializer + Sub-Agents)

For inherently parallel tasks or massive operations, a single Ralph Loop iterating sequentially becomes a bottleneck.

The Solution: The Initializer + Sub-Agents pattern.

This pattern limits context bloat by isolating the action space. The fast sub-agents execute tightly scoped tasks, while the Initializer maintains the strategic overview.

OODA Loop Mapping

The Ralph Loop is OODA mechanized:

OODA PhaseRalph Loop Implementation
ObserveRead codebase state, error logs, failed builds
OrientMarshal context, interpret errors, read progress file
DecideFormulate specific plan for next iteration
ActModify files, run tests, commit changes

The cycle repeats until external verification passes.

Relationship to Other Patterns

Context Gates — Context rotation + progress files = state filtering between iterations. Ralph Loops are Context Gates applied to the iteration boundary.

Adversarial Code Review — Ralph architecturally enforces Generator/Judge separation. External tooling is the “Judge” that prevents self-assessment failure.

The Spec — Completion promises require machine-verifiable success criteria. Well-structured Specs with Gherkin scenarios are ideal Ralph inputs.

Workflow as Code — The practice for implementing Ralph Loops using typed step abstractions rather than prompt-based orchestration. Provides deterministic control flow with the agent invoked only for probabilistic tasks.

Anti-Patterns

Anti-PatternDescriptionFailure Mode
Vague Prompts”Improve this codebase” without specific criteriaDivergence; endless superficial changes
No External VerificationRelying on agent self-assessmentSelf-Assessment Trap; hallucinates success
No Iteration CapsRunning without max iterations limitInfinite loops; runaway API costs
No Sandbox IsolationAgent has access to sensitive host filesSecurity breach; SSH keys, cookies exposed
No Context RotationLetting context window fill without rotationContext rot; degraded reasoning
No Progress FilesFresh iterations re-discover completed workWasted tokens; repeated mistakes

Unbounded Generation

Running Ralph Loop without scope constraints produces volume without value. Each iteration may “fix” the immediate error while introducing architectural drift. Over time, the codebase becomes:

Missing Architectural Verification

Ralph Loop’s default exit criteria (tests pass, compilation succeeds) don’t verify architectural coherence. A loop that only checks “does it work?” will happily generate code that violates design patterns, duplicates logic, or introduces subtle inconsistencies.

Mitigation: Combine Ralph Loop with Constitutional Review to verify outputs against architectural principles, not just functional requirements.

Guardrails

RiskMitigation
Infinite LoopingHard iteration caps (20-50 iterations)
Context RotPeriodic rotation at 60-80% capacity
Security BreachSandbox isolation (Docker, WSL)
Token WasteExact completion promise requirements
Logic DriftFrequent Git commits each iteration
Cost OverrunAPI cost tracking per session

Specs

Living documents that serve as the permanent source of truth for features, solving the context amnesia problem in agentic development.

Status: Live | Last Updated: 2026-01-13

Definition

A Spec is the permanent source of truth for a feature. It defines how the system works (Design) and how we know it works (Quality).

Unlike traditional tech specs or PRDs that are “fire and forget,” specs are living documents. They reside in the repository alongside the code and evolve with every change to the feature.

Crucially, The Spec pattern adheres to a spec-anchored philosophy. The spec defines the architectural intent and boundaries, but deterministic code remains the ultimate source of truth for runtime logic. Attempting to use a spec as the absolute only source artifact (spec-as-source) to 100% generate a codebase is an anti-pattern that sacrifices the agent control loop and regresses to Model-Driven Development failures.

The Economy of Code

“Talk is cheap. Show me the code.” — Linus Torvalds, 2000

In the AI era, this economic reality has flipped. Code is cheap. Show me the talk.

Generating 10,000 lines of code is now effectively free. The high-value activity is no longer typing semantics, but articulating intent. The Spec is that articulation—the “Expensive Talk” that directs the cheap labor of code generation. Without a Spec, you have infinite Provenance-free code (“slop”).

The Problem: Context Amnesia

Agents do not have long-term memory. They cannot recall Jira tickets from six months ago or Slack conversations about architectural decisions. When an agent is tasked with modifying a feature, it needs immediate access to:

Without specs, agents reverse-engineer intent from code comments and commit messages—a process prone to hallucination and architectural drift.

Traditional documentation fails because:

Specs solve this by making documentation a first-class citizen in the codebase, subject to the same version control and review processes as the code itself.

State vs Delta

This is the core distinction that makes agentic development work at scale.

DimensionThe SpecThe PBI
PurposeDefine the State (how it works)Define the Delta (what changes)
LifespanPermanent (lives with the code)Transient (closed after merge)
ScopeFeature-level rulesTask-level instructions
AudienceArchitects, Agents (Reference)Agents, Developers (Execution)

The Spec defines the current state of the system:

The PBI defines the change:

The PBI references the Spec for context and updates the Spec when it changes contracts.

Why Separation Matters

Sprint 1: PBI-101 "Build notification system"
  → Creates /plans/notifications/spec.md
  → Spec defines: "Deliver within 100ms via WebSocket"

Sprint 3: PBI-203 "Add SMS fallback"
  → Updates spec.md with new transport rules
  → PBI-203 is closed, but the spec persists

Sprint 8: PBI-420 "Refactor notification queue"
  → Agent reads spec.md, sees all rules still apply
  → Refactoring preserves all documented contracts

Without this separation, the agent in Sprint 8 has no visibility into decisions made in Sprint 1.

The Assembly Model

Specs serve as the context source for Feature Assembly. Multiple PBIs reference the same spec, and the spec’s contracts are verified at quality gates.

flowchart LR
  A[/spec.md/]

  B[\pbi-101.md\]
  C[\pbi-203.md\]
  D[\pbi-420.md\]

  B1[[FEATURE ASSEMBLY]]
  C1[[FEATURE ASSEMBLY]]
  D1[[FEATURE ASSEMBLY]]

  E{GATE}

  F[[MIGRATION]]

  A --> B
  A --> C
  A --> D

  B --> B1
  C --> C1
  D --> D1

  B1 --> E
  C1 --> E
  D1 --> E

  A --> |Context|E

  E --> F
Mermaid Diagram

Anatomy

Every spec consists of two parts:

Blueprint (Design)

Defines implementation constraints that prevent agents from hallucinating invalid architectures.

Contract (Quality)

Defines verification rules that exist independently of any specific task.

The Contract section implements Behavior-Driven Development principles: scenarios define what behavior is expected without dictating how to implement it. This allows agents to interpret intent dynamically while providing clear verification criteria.

For detailed structure, examples, and templates, see the Living Specs Practice Guide.

Relationship to Other Patterns

The PBI — PBIs are the transient execution units (Delta) that reference specs for context. When a PBI changes contracts, it updates the spec in the same commit.

Feature Assembly — Specs define the acceptance criteria verified during assembly. The diagram above shows this flow.

Experience Modeling — Experience models capture user journeys; specs capture the technical contracts that implement those journeys.

Context Engineering — Specs are structured context assets optimized for agent consumption, with predictable sections (Blueprint, Contract) for efficient extraction.

Behavior-Driven Development — BDD provides the methodology for the Contract section. Gherkin scenarios serve as “specifications of behavior” that guide agent reasoning and define acceptance criteria.

Iterative Spec Refinement

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” This is valid—specs are not waterfall artifacts.

The refinement cycle:

  1. Initial Spec — Capture known constraints (API contracts, quality targets, anti-patterns)
  2. Implementation Discovery — Agent or human encounters edge cases, performance issues, or missing requirements
  3. Spec Update — New constraints committed alongside the code that revealed them
  4. Verification — Gate validates implementation against updated spec
  5. Repeat

This is the Learning Loop applied to specs: the spec doesn’t prevent learning—it captures learnings so agents can act on them in future sessions.

“Large Language Models give us great leverage—but they only work if we focus on learning and understanding.” — Unmesh Joshi, via Martin Fowler

Industry Validation

The Spec pattern has emerged independently across the industry under different names. Notably, Rasmus Widing’s Product Requirement Prompt (PRP) methodology defines the same structure: Goal + Why + Success Criteria + Context + Implementation Blueprint + Validation Loop.

His core principles—“Plan before you prompt,” “Context is everything,” “Scope to what the model can reliably do”—mirror ASDLC’s Spec-Driven Development philosophy.

See Product Requirement Prompts for the full mapping and Further Reading for convergent frameworks.

See also:

The ADR

A structural pattern for capturing architectural decisions with context, rationale, and consequences in an immutable record.

Status: Live | Last Updated: 2026-01-28

Definition

The ADR (Architecture Decision Record) is a lightweight document pattern for capturing significant architectural decisions. Each ADR records exactly one decision: what was decided, why it was decided, and what consequences follow.

Unlike The Spec which defines the current state of a feature and evolves with the code, an ADR is immutable—it captures a snapshot of thinking at a specific moment. When circumstances change, a new ADR supersedes the old one, preserving the decision history.

The Problem: Decision Amnesia

Architectural knowledge decays rapidly. Six months after a technology choice, teams ask:

Without explicit decision records, this context lives only in:

For agentic development, this creates a severe problem. An agent refactoring authentication code has no visibility into why Supabase Auth was chosen over Firebase Auth—it may inadvertently violate the constraints that drove the original decision.

The Solution: Immutable Decision Records

ADRs solve decision amnesia by making architectural decisions first-class artifacts in the codebase. Each decision is documented at the moment it’s made, with full context preserved.

docs/adrs/
├── ADR-001-use-postgresql.md
├── ADR-002-supabase-auth.md
├── ADR-003-event-driven-messaging.md
└── ADR-004-svelte-over-react.md        # Supersedes ADR-001 (hypothetical)

The key insight: decisions are immutable, but their status changes. ADR-001 might be “Accepted” for two years, then become “Superseded by ADR-010” when the team migrates databases.

Anatomy

An ADR consists of six sections, each serving a distinct purpose:

1. Title

A short, descriptive name with a unique identifier.

Format: ADR-NNN: Decision Summary

Examples:

2. Status

The lifecycle state of the decision:

StatusMeaning
ProposedUnder discussion, not yet decided
AcceptedDecision made and in effect
DeprecatedNo longer recommended but not replaced
SupersededReplaced by a newer ADR (link to successor)

Example: Status: Superseded by ADR-015

3. Context

The forces and constraints that shaped the decision. This is the why—without it, the decision appears arbitrary.

Include:

Example:

We need real-time collaboration features. The existing polling-based approach creates unacceptable latency (>2s) and server load. The team has experience with PostgreSQL but not MongoDB. We have 3 weeks before the feature deadline.

4. Decision

What was decided. State it clearly and unambiguously.

Format: “We will [do X]” or “We decided to [do X]”

Example:

We will use Supabase Realtime (built on PostgreSQL logical replication) for real-time collaboration features.

5. Consequences

The outcomes of this decision—positive, negative, and neutral. Honesty here is critical. A decision that hides its downsides will be revisited with confusion.

Structure:

Example:

Positive: Leverages existing PostgreSQL expertise. Real-time updates with <100ms latency. No new database to manage.

Negative: Tied to Supabase SaaS (vendor lock-in). Less flexible query patterns than dedicated real-time databases. Learning curve for PostgreSQL triggers.

Neutral: Requires migration of subscription logic from polling to channels.

6. Alternatives Considered

What other options were evaluated and why they were rejected. This prevents future teams from re-evaluating the same options without understanding the original analysis.

Format: List each alternative with rejection rationale.

Example:

  • Firebase Realtime Database: Rejected—would require a second database system and doesn’t integrate with existing PostgreSQL data.
  • Custom WebSocket implementation: Rejected—significant development effort and maintenance burden for real-time infrastructure.
  • Pusher: Rejected—adds external dependency and per-message costs at scale.

State vs The Spec

The ADR complements The Spec but serves a different purpose:

DimensionThe SpecThe ADR
PurposeDefine how it works nowRecord why we decided
MutabilityLiving (updated with code)Immutable (superseded, not edited)
ScopeFeature-level behaviorArchitectural choice
AudienceImplementersArchaeologists, reviewers

A feature Spec might say “Authentication uses Supabase Auth with Magic Link.” The ADR explains why Supabase Auth was chosen over Firebase Auth.

Adversarial Decision Review

The Adversarial Code Review pattern validates code against specs. ADRs need a different review approach—Adversarial Decision Review—that evaluates the decision quality itself.

Critic Agent Prompt

You are reviewing an Architecture Decision Record.

Evaluate:
1. **Context Completeness** — Are the forces and constraints clearly articulated? 
   Could someone unfamiliar with the project understand WHY this decision was needed?

2. **Alternatives Rigor** — Were reasonable alternatives considered? 
   Is each rejection rationale specific (not "too complex" without explanation)?

3. **Consequence Honesty** — Are negative outcomes acknowledged?
   Beware ADRs with only positive consequences—every decision has trade-offs.

4. **Reversibility Clarity** — Is it clear how to undo this decision if needed?
   What would trigger reconsideration?

5. **Scope Discipline** — Does this ADR decide exactly one thing?
   Multiple decisions should be separate ADRs.

Output: ACCEPT or list of concerns with suggested improvements.

This pattern ensures ADRs maintain quality as high-value context for future decisions.

Relationship to Other Patterns

The Spec — Specs define current feature state; ADRs explain the architectural choices that constrain specs. An ADR might mandate “all API routes use REST,” and feature specs implement within that constraint.

Agent Constitution — ADRs can become constitutional rules. “ADR-003: All database migrations must be backward-compatible” may be promoted to an agent constitution constraint that the agent must not violate.

Context Engineering — ADRs are high-value context for agents. Including relevant ADRs in agent context helps prevent accidental violations of past architectural decisions.

Request for Comments — RFCs are proposals that spawn ADRs. An RFC gathers feedback; acceptance creates one or more ADRs.

ADR Authoring — The practice that implements this pattern with templates, lifecycle guidance, and file organization.

The PBI

A transient execution unit that defines the delta (change) while pointing to permanent context (The Spec), optimized for agent consumption.

Status: Live | Last Updated: 2026-01-13

Definition

The Product Backlog Item (PBI) is the unit of execution in the ASDLC. While The Spec defines the State (how the system works), the PBI defines the Delta (the specific change to be made).

In an AI-native workflow, the PBI transforms from a “User Story” (negotiable conversation) into a Prompt (strict directive). The AI has flexibility in how code is written, but the PBI enforces strict boundaries on what is delivered.

The Problem: Ambiguous Work Items

Traditional user stories (“As a user, I want…”) are designed for human negotiation. They assume ongoing dialogue, implicit context, and shared understanding built over time.

Agents don’t negotiate. They execute. A vague story becomes a hallucinated implementation.

What fails without structured PBIs:

The Solution: Pointer, Not Container

The PBI acts as a pointer to permanent context, not a container for the full design. It defines the delta while referencing The Spec for the state.

DimensionThe SpecThe PBI
PurposeDefine the State (how it works)Define the Delta (what changes)
LifespanPermanent (lives with the code)Transient (closed after merge)
ScopeFeature-level rulesTask-level instructions
AudienceArchitects, Agents (Reference)Agents, Developers (Execution)

Anatomy

An effective PBI consists of four parts:

1. The Directive

What to do, with explicit scope boundaries. Not a request—a constrained instruction.

2. The Context Pointer

Reference to the permanent spec. Prevents the PBI from becoming a stale copy of design decisions that live elsewhere.

3. The Verification Pointer

Link to success criteria defined in the spec’s Contract section. The agent knows exactly what “done” looks like.

4. The Refinement Rule

Protocol for when reality diverges from the spec. Does the agent stop? Update the spec? Flag for human review?

Bounded Agency

Because AI is probabilistic, it requires freedom to explore the “How” (implementation details, syntax choices). However, to prevent hallucination, we bound this freedom with non-negotiable constraints.

Negotiable (The Path): Code structure, variable naming, internal logic flow, refactoring approaches.

Non-Negotiable (The Guardrails): Steps defined in the PBI, outcome metrics in the Spec, documented anti-patterns, architectural boundaries.

The PBI is not a request for conversation—it’s a constrained optimization problem.

Atomicity & Concurrency

---
title: Spec-flow
---
flowchart LR
  T1 -->|/ralpt| FA1
  T2 -->|/dev| FA2
  T3 --> FA3
  subgraph Spec
    SA([Intent]) -->|/spec| SB[spec.md]
    SB -->|/review| SA
    SB -->|'plan.1'| T1[PBI.1]
    SB -->|'plan.2'| T2[PBI.2]
    SB -->|'plan.3'| T3[PBI.3]
  end
  subgraph Feaure Assembly 1
    FA1[[Ralph loop]] -->
    FA1.1{gates} -->|FAIL|FA1
  end
  subgraph Feaure Assembly 2
    FA2[/develop/] -->|'/review'| FA2.1
    FA2.1[/Adversarial Review/] -->|PASS| FA2.2
    FA2.1 -->|FAIL| FA2
    FA2.2{gates} -->|FAIL| FA2
  end
  subgraph Feaure Assembly 3
    FA3([Craftmanship]) -->
    FA3.1{gates} -->|FAIL|FA3
  end
  FA1.1 -->|PASS|E(((DONE)))
  FA2.2 -->|PASS|E
  FA3.1 -->|PASS|E
Mermaid Diagram

In swarm execution (multiple agents working in parallel), each PBI must be:

Atomic: The PBI delivers a complete, working increment. No partial states. If the agent stops mid-task, either the full change lands or nothing does.

Self-Testable: Verification criteria must be executable without other pending PBIs completing first. If PBI-102 requires PBI-101’s code to test, PBI-102 is not self-testable.

Isolated: Changes target distinct files/modules. Two concurrent PBIs modifying the same file create merge conflicts and non-deterministic outcomes.

Dependency Declaration

When a PBI requires another to complete first, the dependency is declared explicitly in the PBI structure—not discovered at merge time.

Relationship to Other Patterns

The Spec — The permanent source of truth that PBIs reference. The Spec defines state; the PBI defines delta.

PBI Authoring — The practice for writing effective PBIs, including templates and lifecycle.

See also:

Agent Optimization Loop

The recursive process of using feedback from scenarios to continuously tune agent prompts, context, and tools.

Status: Experimental | Last Updated: 2026-02-21

Definition

The Agent Optimization Loop is the distinct lifecycle for building the agents themselves, separate from the lifecycle of the software they build. It replaces static “Evals” with dynamic Scenarios—realistic, localized integration tests that verify agent behavior in context.

While the Ralph Loop optimizes the product (Code) through iteration, the Agent Optimization Loop optimizes the producer (Agent) through meta-feedback.

The Problem: Static Evals

Standard “Leaderboard” evaluations (GSM8K, HumanEval) measure raw intelligence, not job performance. Optimizing for them leads to overfitting on generic tasks while failing on domain-specific constraints.

In a Software Factory, we need agents that perform specifically well on our codebase, our patterns, and our constraints. A generic coding agent that knows Python well but ignores our project’s Result type pattern is functionally broken.

The Solution: The Factory Loop

The Agent Optimization Loop treats the agent’s configuration (System Prompt, Context, Tools) as the source code, and “Scenarios” as the unit tests.

Anatomy

The loop consists of three phases: Seed, Validate, and Loop.

1. Seed (Context Engineering)

The initial configuration of the agent. This includes:

2. Validate (Scenarios)

Instead of running the agent on a generic problem, we run it against a Scenario—a specific, representative task from our actual backlog.

3. Loop (Meta-Optimization)

When the agent fails a scenario, we do not just fix the code (that’s the Ralph Loop). We fix the Agent.

This creates a compounding asset: an agent that gets smarter about this specific codebase over time.

Probabilistic Satisfaction & Holdouts

In mature setups (such as an AI Software Factory), evaluation shifts from boolean definitions of success (“the test suite is green”) to empirical Probabilistic Satisfaction. Agents are evaluated against thousands of Holdout Scenarios—simulated user stories explicitly hidden from the agent during implementation. This prevents the agent from overfitting or “cheating” the tests, ensuring generalized competence.

Offline vs Online Evolution

The Agent Optimization Loop manifests in two distinct modes:

Offline Factory Optimization (Current Focus)

Optimization occurs asynchronously through explicit integration testing (Scenarios). Humans or meta-evaluators analyze failures and update the version-controlled context (Specs, AGENTS.md) and rerun. This guarantees determinism and peer review but has higher latency.

Online Context Evolution (Experimental)

Often called “Continual Learning in Token Space,” where an agent natively reflects over its past trajectories (e.g., distilling sessions into an AGENTS.md update or generating a new skill file automatically). While this enables rapid adaptation, it risks uncontrolled drift if the agent infers the wrong lesson from a failure.

In ASDLC, we treat Online Evolution as an input to Offline Optimization: agents can suggest updates to the context, but these updates must pass deterministic Architectural Review before becoming canonical.

Relationship to Other Patterns

Ralph Loop — The Execution Loop. The Agent Optimization Loop runs offline to improve the agent so that the Ralph Loop runs more efficiently online.

Context Engineering — The discipline that informs the “Seed” and “Loop” phases. The Optimization Loop is the process of verifying that our Context Engineering is effective.

Agentic SDLC — The overarching framework. The Agent Optimization Loop is the engine of the “Agent Factory” component of the ASDLC.

Agentic Double Diamond

A computational framework transforming the classic design thinking model into an executable pipeline of context verification and assembly.

Status: Experimental | Last Updated: 2026-02-25

Definition

The Agentic Double Diamond is a computational framework that transforms the traditional design thinking model (Discover, Define, Develop, Deliver) into an executable pipeline where every phase produces machine-readable context rather than static artifacts.

Agentic Double Diamond Diagram

In this model, the Spec becomes the primary source code, and “Coding” becomes an automated assembly step. The human role shifts from Implementation to Context Engineering and Verification.

The Problem: Lossy Handoffs

Traditional software development suffers from signal degradation at every handoff:

  1. The “Gap of Silence”: Insights from the Discover phase are summarized into PowerPoints or tickets, stripping away the raw evidence needed for edge-case validation.
  2. Static Deliverables: The Define phase produces Figma files or flat requirements. To an AI, these are unstructured blobs. Use of “Vibe Coding” creates functionality that feels right but fails under rigorous scrutiny.
  3. Verification Lag: We typically only verify if we built the thing right (Testing) after weeks of coding. We rarely verify if we are building the right thing (Strategy) until it’s too late.

The result is a “Build Trap” where we efficiently ship features that solve the wrong problems.

The Solution: A Computational Pipeline

The Agentic Double Diamond reimagines the two diamonds not as workshop phases, but as Context Furnaces. Each furnace ingests raw, unstructured input and refines it into a stricter, more deterministic state.

Crucially, we introduce Adversarial Gates at the convergence points of each diamond to stop “Solution Pollution”—the tendency to rush into building without a valid problem definition.

Anatomy

The pattern consists of four computational phases and one operational phase (Run).

Phase 1: DISCOVER (The Sensor Network)

From Chaos to Signal.

Instead of manual research sprints, we use agents to ingest broad signals (user feedback, logs, market data) and cluster them into patterns.

Context Output: Problem Graph (A structured map of user needs and pain points).

Phase 2: DEFINE (The Strategy Engine)

From Signal to Insight.

We crystallize the signals into a coherent strategy. This is where Product Thinking applies constraint satisfaction to select the right problem to solve.

Human Role: Thought Leader (Deciding what matters). Agent Role: Thought Partner (Challenging assumptions).

Context Output: Strategy Document & Validated Problem Statement.

Phase 3: SPEC (The New Coding)

From Insight to Blueprint.

This is the most significant shift. In the Agentic SDLC, Spec Writing IS Coding. The Spec is the permanent, living source of truth. It defines the “What” (Behavior) and the “How” (Architecture) in a format rigorous enough for agents to execute.

Context Output: The Spec (Context, Blueprint, Contract).

Phase 4: ASSEMBLE (The Agentic Manufactory)

From Blueprint to Assembly.

Agents ingest the Spec and “assemble” the implementation. This phase is highly automated. The agents generate code, tests, and documentation that adhere strictly to the Spec.

Human Role: Verifier (Reviewing the assembly against the Spec). Agent Role: Builder (Implementation).

Context Output: Source Code, Tests, Micro-Commits.

Phase 5: RUN (The Feedback Loop)

From Assembly to Signal.

The software operates in production, generating new signals (usage data, errors, feedback) that feed back into Phase 1, closing the loop.

Relationship to Other Patterns

Anti-Patterns

The Vibe Coding Shortcut

Problem: Skipping the Define and Spec phases to jump straight to Assemble (Vibe Coding). Consequence: Fast “sugar-high” shipping of features that crumble under production complexity because they lack structural integrity.

The Static Spec

Problem: Treating Phase 3 as a “PDF generation” step. Consequence: The Spec drifts from reality immediately. In this pattern, the Spec must be a Living Spec in the repo, or the automated assembly fails.

Industry Implementations

The Agentic Double Diamond is a theoretical model that maps closely to emerging industry practices. A concrete example is the Effective Delivery AI-driven framework, which observed a 30-40% development speed increase using a 4-phase “copilot-collections” workflow that tightly aligns with our phases:

  1. Research $\rightarrow$ Discover: Agents build context around a task and source related information.
  2. Plan $\rightarrow$ Define / Spec: Agents create a structured implementation plan with clear acceptance criteria.
  3. Implement $\rightarrow$ Assemble: Specialized engineers (Frontend/Backend) execute against the agreed plan.
  4. Review $\rightarrow$ Assemble (Verification Gate): Critic agents perform structured code reviews, verifying against Figma designs or the implementation plan.

This workflow demonstrates that creating specialized context furnaces (their Research/Plan phases) before implementation leads to measurable, significant gains over standard “vibe coding” with a single LLM prompt.

Context Gates

Architectural checkpoints that filter input context and validate output artifacts between phases of work to prevent cognitive overload and ensure system integrity.

Status: Experimental | Last Updated: 2026-01-18

Definition

Context Gates are architectural checkpoints that sit between phases of agentic work. They serve a dual mandate: filtering the input context to prevent cognitive overload, and validating the output artifacts to ensure system integrity.

Unlike “Guardrails,” which conflate prompt engineering with hard constraints, Context Gates are distinct, structural barriers that enforce contracts between agent sessions and phases.

The Problem: Context Pollution and Unvalidated Outputs

Without architectural checkpoints, agentic systems suffer from two critical failures:

Context Pollution — Agents accumulate massive conversation histories (observations, tool outputs, internal monologues, errors). When transitioning between sessions or tasks, feeding the entire context creates cognitive overload. Signal-to-noise ratio drops, and agents lose focus on the current objective—Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side.”

Unvalidated Outputs — Code that passes automated tests can still violate semantic contracts (spec requirements, architectural constraints, security policies). Without probabilistic validation layers, implementation shortcuts and silent failures slip through to production.

Why Existing Approaches Fail:

The Solution: Dual-Mandate Checkpoint Architecture

Context Gates solve this by creating two distinct checkpoint types:

Input Gates — Filter and compress context entering an agent session, ensuring only relevant information is presented. This prevents cognitive overload and maintains task focus.

Output Gates — Validate artifacts leaving an agent session through three tiers of verification: deterministic checks, probabilistic review, and human acceptance.

The key insight: Context must be controlled at the boundaries, not throughout execution. Agents work freely within their session, but transitions enforce strict contracts.

Anatomy

Context Gates consist of two primary structures, each with distinct sub-components:

Input Gates

Input Gates control what context enters an agent session.

Summary Gates (Cross-Session Transfer)

When transitioning work between agent sessions, Summary Gates compress conversation history into essential state.

Examples:

Context Filtering (Within-Session)

During multi-step tasks within a single session, Context Filtering determines what historical information is relevant to the current sub-task.

Output Gates

Output Gates validate artifacts before they progress to the next phase. Three tiers enforce different types of correctness:

Quality Gates (Deterministic)

Binary, automated checks enforced by the toolchain.

Examples:

Review Gates (Probabilistic, Adversarial)

LLM-assisted validation of semantic correctness and contract compliance.

Examples:

Output Format: When violations are detected, Review Gates provide actionable feedback:

  1. Violation Description — What contract was broken
  2. Impact Analysis — Why this matters (performance, security, maintainability)
  3. Remediation Path — Ordered list of fixes (prefer standard patterns, escalate if needed)
  4. Test Requirements — What tests would prevent regression

This transforms Review Gates from “reject” mechanisms into “guide to resolution” checkpoints.

Acceptance Gates (Human-in-the-Loop)

Subjective checks requiring human strategic judgment.

Examples:

Workflow Enforcement (Denial Gates)

Mechanisms that actively block agents from bypassing the defined process.

Examples:

Gate Taxonomy

FeatureSummary Gates (Input)Context Filtering (Input)Quality Gates (Output)Review Gates (Output)Acceptance Gates (Output)
FunctionSession handoffWithin-session filteringCode validitySpec complianceStrategic fit
GoalClean session transferMaintain focusPrevent broken codeEnforce contractsPrevent bad product
MechanismLLM SummarizationSemantic SearchCompilers / TestsLLM CritiqueHuman Review
NatureCompressionFilteringDeterministicProbabilisticSubjective
OutcomeCondensed contextClean context windowValid compilationSpec complianceApproved release

Relationship to Other Patterns

Adversarial Code Review — Implements the Review Gate tier of Output Gates. Uses a Critic Agent to validate code against the Spec’s contracts.

Constitutional Review — Extends Review Gates by validating against both the Spec (functional) and the Agent Constitution (architectural values).

Model Routing — Works with Context Gates to assign appropriate model capabilities to different gate types (throughput models for generation, reasoning models for Review Gates).

The Spec — Provides the contract that Review Gates validate against.

Agent Constitution — Provides architectural constraints that Constitutional Review validates against.

Ralph Loop — Applies Context Gates at iteration boundaries, using context rotation and progress files to prevent cognitive overload across autonomous loops.

Feature Assembly — The practice that uses all three Output Gates (Quality, Review, Acceptance) in the verification pipeline.

Workflow as Code — The practice for implementing gate enforcement programmatically rather than via prompt instructions.

Strategic Value

Prevents Context Overload — Agents receive only relevant information, maintaining task focus and reducing token usage.

Catches Semantic Violations — Review Gates detect contract violations that pass deterministic checks (performance anti-patterns, security gaps, missing edge cases).

Reduces Human Review Burden — Quality and Review Gates filter out obvious errors, letting humans focus on strategic fit rather than technical correctness.

Enforces Architectural Consistency — Constitutional Review (via Review Gates) ensures code follows project principles, not just internet-average patterns.

Creates Clear Contracts — Each gate type has explicit pass/fail criteria, making verification deterministic where possible and explicit where probabilistic.

See also:

Context Map

A high-density navigational index that enables agents to locate knowledge without managing massive context windows.

Status: Experimental | Last Updated: 2026-02-16

Definition

A Context Map is a curated, high-density index of a larger knowledge base (the “Territory”) provided to an agent upfront. It acts as a navigational aid, allowing the agent to locate specific information or understand the system’s topology without ingesting the entire corpus or relying on blind search.

Instead of hoping an agent “finds” the right context through tool calls, the Context Map guarantees the agent knows what exists and where it resides.

The Problem: The Haystack Failure

Agents operating on large codebases or documentation sets face two failure modes:

  1. Context Overload: Feeding all documentation into the context window is expensive, slow, and typically exceeds token limits.
  2. Search Blindness: Letting agents search “on demand” (RAG) is unreliable. Vercel’s research shows agents often fail to invoke search tools or craft poor queries, leading to a 79% success rate compared to 100% with a map.

The result is Hallucination by Omission: The agent invents a believable but incorrect solution because it failed to retrieve the authoritative documentation.

The Solution: The Map is Not the Territory

The Context Map pattern separates Navigation from Ingestion.

We provide the agent with a Map—a highly compressed, structural representation of the available knowledge. The Map contains:

Critically, The Map is small enough to fit permanently in the context window, while The Territory is loaded only on demand.

Anatomy

A Context Map consists of three layers of increasing density:

1. The Index (Topology)

A structural overview of the domain.

2. The Glossary (Signposts)

A definition of domain-specific terms to prevent vocabulary mismatch.

3. The Routing Table (Pointers)

Explicit links between problems and their authoritative sources.

The Ecosystem: Where does it fit?

ScopeComponentRoleExample
DisciplineContext EngineeringThe “Physics”. The study of how to structure info.The Concept
PatternContext MapThe Strategy. “Use a Map to find the Territory.”This Pattern
PracticeContext MappingThe Tactics. “How to write the Map in YAML.”The How-To Guide
ContainerAGENTS.mdThe Implementation. The file where the Map lives.The Spec

Context Map is the Spatial pillar of Context Engineering.

Implementation Strategies

See the Context Mapping practice for detailed implementation guides.

1. The Pragmatic Map (YAML)

Using Annotated YAML to describe project structure and internal documentation. This is preferred for its readability and standard structure which LLMs assume naturally.

2. The Compressed Map (Vercel Style)

Using a high-density Pipe-Delimited format (|path/to/file:{doc1,doc2}) to map massive external documentation sets (e.g., framework docs) where token efficiency is paramount.

Spec Reversing

Using frontier models to derive specifications from existing code to bootstrap the Agentic SDLC in brownfield projects.

Status: Experimental | Last Updated: 2026-02-04

The Void: Missing Truth

The Agentic SDLC relies on the Spec as the source of truth. However, most real-world projects are “brownfield”—they have code but no up-to-date documentation. This creates a “Void” where agents have no context to ground their work, leading to regression loops and hallucinated requirements.

Spec Reversing bridges this gap by treating the current codebase as the de facto truth—but only temporarily.

The Pattern

Spec Reversing is a bootstrapping workflow. Instead of writing a spec from scratch, we use a frontier model (like Claude 3.5 Sonnet or GPT-4o) to “read” the code and “write” the missing spec.

The workflow follows this loop:

  1. Select Scope: Identify the specific file or component you are about to modify.
  2. Reverse: Feed the code to a frontier model in Architect or Planning mode.
    • Prompt: “Reverse engineer a functional specification from this code. Capture the intent, logic, and edge cases.”
  3. Review: A human (you) reviews the generated spec.
    • Critique: “Is this actually what we want? Or just what the code currently does?”
    • Correct: Fix any bugs in the logic (in the spec) before touching the code.
  4. Commit: Save this as a new Spec file (e.g., specs/feature-name.md).
  5. Execute: Now create your PBI based on this new Spec.

When to Use

Directives

Benefits

Part II: Practices

ADR Authoring

Step-by-step guide for creating, organizing, and maintaining Architecture Decision Records in your codebase.

Status: Live | Last Updated: 2026-01-28

Definition

ADR Authoring is the practice of writing, organizing, and maintaining Architecture Decision Records throughout a project’s lifecycle. This practice implements The ADR pattern with concrete templates, file organization conventions, and lifecycle management.

Following this practice produces a searchable, version-controlled archive of architectural decisions that serves both humans and agents.

When to Use

Write an ADR when:

Skip an ADR when:

Process

Step 1: Check for Existing ADRs

Before writing a new ADR, search for existing decisions in the same domain.

# Search for related ADRs
grep -r "database" docs/adrs/

If a relevant ADR exists, you may need to supersede it rather than create a fresh decision.

Step 2: Choose an ID and Title

Assign the next sequential ID and write a clear, descriptive title.

Format: ADR-NNN-short-descriptive-title.md

Filename conventions:

Step 3: Document the Context

This is the most important section. Capture the forces that make this decision necessary:

[!TIP] Write context as if explaining to someone joining the team next month. They should understand why this decision was needed, not just what was chosen.

Step 4: State the Decision

Write a clear, unambiguous statement of what was decided.

Good: “We will use PostgreSQL as the primary database for all transactional data.”

Bad: “We decided to maybe consider PostgreSQL or something similar.”

Step 5: Document Consequences

List outcomes honestly—positive, negative, and neutral.

Positive: What capabilities or benefits does this enable?

Negative: What trade-offs are we accepting? What doors does this close?

Neutral: What changes but isn’t inherently good or bad?

[!WARNING] ADRs with no negative consequences are suspicious. Every significant decision has trade-offs. Hiding them leads to confusion when the downsides surface later.

Step 6: Record Alternatives Considered

For each seriously considered alternative, explain why it was rejected. Use specific, concrete reasons—not vague dismissals.

Good: “Firebase Realtime Database rejected—would require a second database system and doesn’t integrate with existing PostgreSQL data models.”

Bad: “Firebase rejected—too complex.”

Step 7: Set Status and Commit

Set status to Proposed for review, or Accepted if the decision is final. Commit the ADR alongside related code changes when possible.

git add docs/adrs/ADR-015-auth-use-supabase.md
git commit -m "docs: add ADR-015 for Supabase Auth decision"

Template

# ADR-NNN: {Title}

**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-XXX

**Date:** YYYY-MM-DD

## Context

{What forces are at play? What problem needs solving? What constraints exist?}

## Decision

{What was decided? State clearly and unambiguously.}

## Consequences

**Positive:**
- {Benefit 1}
- {Benefit 2}

**Negative:**
- {Trade-off 1}
- {Trade-off 2}

**Neutral:**
- {Change 1}

## Alternatives Considered

### {Alternative 1}

{Description and rejection rationale}

### {Alternative 2}

{Description and rejection rationale}

File Organization

Recommended directory structure:

docs/
└── adrs/
    ├── README.md              # Index and search tips
    ├── ADR-001-use-postgres.md
    ├── ADR-002-event-driven.md
    ├── ADR-003-supabase-auth.md
    └── ...

The README.md should include:

Lifecycle Management

Status Transitions

stateDiagram-v2
    [*] --> Proposed
    Proposed --> Accepted : Approved
    Proposed --> Rejected : Rejected
    Accepted --> Deprecated : No longer recommended
    Accepted --> Superseded : Replaced by new ADR
    Deprecated --> [*]
    Superseded --> [*]
    Rejected --> [*]
Mermaid Diagram

Superseding an ADR

When a decision is replaced:

  1. Create the new ADR with the updated decision
  2. Update the old ADR’s status: **Status:** Superseded by [ADR-NNN](./ADR-NNN-title.md)
  3. Do not delete or modify the content of the superseded ADR

This preserves the archaeological record of how thinking evolved.

Common Mistakes

The Novel

Problem: ADR is 10+ pages, covering multiple decisions.

Solution: Split into multiple ADRs. Each ADR should decide exactly one thing.

The Hidden Trade-off

Problem: Consequences section lists only positives.

Solution: Force yourself to list at least one negative consequence. If you can’t find any, you haven’t thought hard enough.

The Vague Alternative

Problem: Alternatives listed but rejection rationale is “too complex” or “not a good fit.”

Solution: Be specific. What exactly made it complex? What would adopting it have required?

The Orphaned Decision

Problem: ADR created but never reviewed or status remains “Proposed” indefinitely.

Solution: Include ADR review in your PR/code review process. ADRs should move to “Accepted” or “Rejected” within one sprint.

The Missing Context

Problem: Decision makes no sense without knowing the constraints that existed at the time.

Solution: Write context as if for a new team member. Include timeline, budget, team skills, existing systems.

Agentic Integration

ADRs serve as high-value context for agents:

Include in agent context when:

Agent-friendly practices:

An agent working on authentication should be provided ADR-003-supabase-auth.md as context to avoid accidentally violating the architectural constraints.

This practice implements:

See also:

Adversarial Code Review

Executing automated verification using a Critic Agent to validate implementation artifacts against Spec contracts.

Status: Live | Last Updated: 2026-01-31

Definition

Adversarial Code Review is the practice of automating code validation by employing a specialized Critic Agent to review claimed implementations against established Spec contracts and the Agent Constitution.

By separating the “Builder” role from the “Critic” role, this practice ensures that verification remains objective and rigorous, catching architectural drifts, security vulnerabilities, and logic errors that might pass standard unit tests.

When to Use

Use this practice to implement a high-reasoning verification gate before human review.

Use this practice when:

Skip this practice when:

Process

Step 1: Fetch Issue Context

Retrieve the source of truth for the work being reviewed. This typically involves getting details from a project management tool (like Linear) to understand the title, description, and acceptance criteria.

Step 2: Gather Implementation Artifacts

Identify what has changed. Check the git status for uncommitted changes or review recent commits associated with the issue ID. Prepare the diff or the set of modified files for the Critic Agent.

Step 3: Load Contracts

Identify the “laws” the implementation must follow. This includes:

Step 4: Adversarial Review

Deploy the Critic Agent with an adversarial persona. Instruct the agent to be skeptical by design and to prioritize rejecting violations over being “helpful.” Compare the code strictly against the loaded contracts.

Step 5: Identify Violations & Verdict

Analyze the Critic’s output. If violations are found, categorize them by impact and provide specific remediation paths. If no violations are found against the contracts, issue a PASS verdict.

Templates

Critic Agent Prompt

Use this template to configure a session or subagent for adversarial review.

# Adversarial Code Review

You are a rigorous **Critic Agent** performing adversarial code review per ASDLC.io patterns.

Your role is skeptical by design: reject code that violates the Spec or Constitution, even if it "works." Favor false positives over false negatives.

## Task
Review the implementation claimed for: {issue_id_or_description}

## Workflow
1. **Fetch Context**: Review specs/{spec_name}.md and {constitution_file}.
2. **Review Artifacts**: Analyze the provided code diff/files.
3. **Compare Strictly**: Check against Spec contracts, Security (RLS/Auth), Type safety, and Design system tokens.
4. **Identify Violations**: For each issue, cite the clause violated, the impact, and the remediation path.

## Output Format

### If No Violations Found:
## Verdict: PASS
[Summary of what was reviewed and why it passes]

### If Violations Found:
## Verdict: NOT READY TO MERGE

### Acceptance Criteria Check
| Criterion | Status | Notes |
|-----------|--------|-------|
| {criterion} | {status} | {notes} |

### Violations Found
**1. [Category]: [Brief description]**
- **Violated**: [Spec section or rule]
- **Impact**: [Why this matters]
- **Remediation**: [How to fix]

Common Mistakes

Using the Same Session

Problem: Allowing the Builder Agent to review its own work within the same chat history. Solution: Always start a fresh session or use a distinct subagent with a high-reasoning model for the review.

Vague Violation Reports

Problem: The Critic flags an issue but doesn’t explain why it’s a violation or how to fix it. Solution: Enforce a structured output format that requires citing specific spec clauses and providing remediation steps.

This practice implements:

Agent Personas

A guide on how to add multiple personas to an AGENTS.md file, with examples.

Status: Live | Last Updated: 2026-02-18

Definition

Defining clear personas for your agents scopes their work by defining boundaries and focus — not by role-playing. A persona tells the agent what kind of judgment to apply, what to prioritize, and what to hand off. When combined with Model Routing, personas can also specify which model to use for each type of work.

For the full specification of the AGENTS.md file, see the AGENTS.md Specification.

Where Personas Live

Personas are session-scoped, not project-scoped. A Critic persona is irrelevant during implementation work. A Designer persona is irrelevant during triage. Loading all persona definitions on every session wastes context and can actively burden the agent with instructions it won’t use — and research shows agents follow instructions they receive faithfully, whether relevant or not (Gloaguen et al., 2026).

Project typeWhere personas live
Single-persona, simple projectInline identity statement in agents.md is fine
Multi-persona projectSkill/workflow files; agents.md holds registry only
Workflow-triggered personaDefined as part of the workflow, injected at invocation

The agents.md should contain a persona registry — names and invocation patterns — not full definitions. The full definition (triggers, goals, guidelines) lives in the skill or workflow file that gets injected when that persona is actually needed.

## Personas
Invoke via skill: @Lead, @Dev, @Designer, @Critic
Definitions: `.claude/skills/`

When to Define a Persona

Use a persona when:

Skip explicit personas when:

Anatomy of a Persona Definition

Each persona definition in a skill file should have four elements:

Trigger — When is this persona active? Goal — What is this persona trying to achieve? Guidelines — What judgment rules apply in this context? Boundaries — What does this persona explicitly not do?

Example: Multi-Persona Skill Files

The following are example skill file contents (not agents.md content):

.claude/skills/lead.md

### Lead Developer / Architect (@Lead)
**Trigger:** System design, specs, planning, ADRs.
**Goal:** Specify feature requirements and architecture. Plan next steps. Produce clear specs before handing to implementation.
**Guidelines**
- Schema Design: Define Zod schemas immediately when creating new content types.
- Routing: Use file-based routing. For dynamic docs, use `[...slug].astro` and `getStaticPaths()`.
- Spec-driven: Always produce a clear spec before handoff. Break large tasks into PBIs with acceptance criteria.
- ADR: Record architectural decisions in `docs/adr/` before implementation begins.
**Boundaries**
- Does not write implementation code — hands off to @Dev.
- Does not review finished code — hands off to @Critic.

.claude/skills/dev.md

### Developer / Implementation Agent (@Dev)
**Trigger:** Implementation tasks, bug fixes.
**Goal:** Implement features and fix bugs from a defined PBI. Keep the codebase healthy and maintainable.
**Guidelines**
- Always work from a PBI with clear acceptance criteria.
- Type Safety: TypeScript strictly. No `any` types.
- Document progress: Update the relevant PBI in `docs/backlog/` after completing tasks.
- Testing: Ensure all changes pass `pnpm check` and `pnpm lint`.
**Boundaries**
- Does not redesign architecture — flags issues and escalates to @Lead.
- Does not self-approve — hands off to @Critic for review.

.claude/skills/critic.md

### Critic / Reviewer (@Critic)
**Trigger:** Code review, constitutional review, pre-merge validation.
**Goal:** Be a skeptical gatekeeper. Assume code is broken or insecure until proven otherwise.
**Guidelines**
- Validate against both The Spec and the Agent Constitution.
- If the spec is vague, reject and demand clarification — do not assume.
- Prioritize correctness and edge-case handling over helpfulness.
- Flag security issues, missing error handling, and type violations explicitly.
**Boundaries**
- Does not fix issues — reports them for @Dev to address.
- Does not approve if any Tier 1 boundary from the Constitution is violated.

agents.md: Registry Only

In agents.md, the persona section should be minimal:

## Personas
Invoke via skill: @Lead, @Dev, @Critic
Definitions: `.claude/skills/`

For single-persona projects, an inline identity statement in agents.md is appropriate:

## Identity
Senior Systems Engineer — Go 1.22, gRPC, high-throughput concurrency.
Favor explicit error handling and composition over inheritance.
Prefer asking over guessing when specs are ambiguous.

Model Routing and Personas

Personas define what work to do and how to scope it. Model Routing is a separate practice that defines which model to use.

Keep them separate. Do not add model profiles to persona definitions — it adds noise and the pairing changes as models evolve. When invoking a persona, select the model manually based on task characteristics:

Persona TypeTypical WorkRecommended Profile
Lead / ArchitectSystem design, specs, ADRsHigh Reasoning
Developer / ImplementationCode generation, refactoringHigh Throughput
Critic / ReviewerConstitutional review, securityHigh Reasoning
Content / DocsDocumentation, KB entriesMassive Context

Industry Implementations

The Effective Delivery AI-driven framework provides a concrete example of mapping the core ASDLC personas to specific phases of the Double Diamond using VS Code and GitHub Copilot. Their workflow defines 6 specific agent personas:

This kind of specialization is a practical implementation of the persona registry pattern, allowing different agent definitions to be invoked depending on the current phase of the delivery workflow.

AGENTS.md Specification

The definitive guide to the AGENTS.md file, focusing on minimal, high-signal context for AI agents.

Status: Live | Last Updated: 2026-02-18

Definition

AGENTS.md is an open format for guiding coding agents, acting as a “README for agents.” It provides a dedicated, predictable place for the minimal, human-authored context that agents need to work effectively on a project — things that are not already expressed by the repo itself.

We align with the agents.md specification, treating this file as the authoritative source of truth for agentic behavior within the ASDLC.

When to Use

Use this practice when:

Skip this practice when:

Core Philosophy

1. Minimal by Design

Research by Gloaguen et al. (2026) on 138 real-world repositories found that LLM-generated context files reduce agent task success rates while increasing inference cost by over 20%. Developer-written context files provide only a marginal improvement (+4%) — and only when they are minimal and precise. The conclusion is unambiguous: unnecessary requirements in context files actively harm agent performance, not because agents ignore them, but because agents follow them faithfully, broadening exploration and increasing reasoning cost without improving outcomes.

The default stance for agents.md should be: if a constraint can be expressed elsewhere, it must not live here.

2. Toolchain First

If a constraint can be enforced deterministically by a tool already in the repo — a linter, formatter, type checker, hook, or CI gate — it must not be restated in agents.md. The tool is the constraint. Restating it creates maintenance debt, dilutes signal, and burdens the agent with requirements it cannot actually enforce.

The correct pointer is:

Lint: `pnpm lint` (Biome — see `biome.json`)

Not a list of what Biome enforces.

What belongs where:

TypeExampleHome
Toolchain-enforcedno var, import order, formattingbiome.json / eslint / tsconfig
Judgment / architecturalprefer composition, ask before adding depsagents.md
Session-scoped personaCritic, Builderskill or workflow file
Task-specific styleAPI naming for this moduleThe Spec / PBI

3. Avoiding the Pink Elephant Problem

Agents are highly susceptible to Context Anchoring (the “Pink Elephant Problem”). Telling an LLM what not to do ensures that the concept is front-and-center in its attention mechanism. If your AGENTS.md says “do not use tRPC”, the agent might still reach for it because the token tRPC is highly active in the context window.

For this reason, treat AGENTS.md as a diagnostic tool for codebase friction. Every instruction added to steer the agent away from a mistake is a signal of structural friction. The ideal response is to fix the underlying ambiguity—for example, by actually deleting the legacy utilities or adding a linter rule—and then delete the instruction from the context file.

4. The Context Anchor (Long-Term Memory)

What agents.md does own is persistent judgment — the things that can’t be expressed by a linter or a type checker. Agents are stateless. Without grounding, each session reverts to generic training weights. agents.md carries the project’s institutional judgment for AI collaboration: how to resolve ambiguity, what to ask before acting, which architectural values to uphold.

This is stable, rarely-changing content. If your agents.md changes often, it is probably carrying content that belongs elsewhere.

5. A README for Agents

Just as README.md is for humans, AGENTS.md is for agents. It complements existing documentation by containing the context agents need that is not already discoverable from the repo structure, toolchain config, or existing docs.

6. Context is Code

In the ASDLC, we treat AGENTS.md with the same rigor as production software:

Tool-Specific Considerations

Different AI coding tools look for different filenames. While AGENTS.md is the emerging standard, some tools require specific naming:

ToolExpected FilenameNotes
Cursor.cursorrulesAlso reads AGENTS.md
Windsurf.windsurfrulesAlso reads AGENTS.md
Claude CodeCLAUDE.mdDoes not read AGENTS.md; case-sensitive
CodexAGENTS.mdNative support
Zed.rulesPriority-based; reads AGENTS.md at lower priority
VS Code / CopilotAGENTS.mdRequires chat.useAgentsMdFile setting enabled

Zed Priority Order

Zed uses the first matching file from this list:

  1. .rules
  2. .cursorrules
  3. .windsurfrules
  4. .clinerules
  5. .github/copilot-instructions.md
  6. AGENT.md
  7. AGENTS.md
  8. CLAUDE.md
  9. GEMINI.md

VS Code Configuration

VS Code requires explicit opt-in for AGENTS.md support:

Recommendation

Create a symlink to support Claude Code without duplicating content:

ln -s AGENTS.md CLAUDE.md

Note that Claude Code also supports CLAUDE.local.md for personal preferences that shouldn’t be version-controlled.

Ecosystem Tools

Ruler

Ruler synthesizes agent instructions from multiple sources (AGENTS.md, .cursorrules, project conventions) and injects them into coding assistants that may not natively support the AGENTS.md standard. Useful for teams using multiple coding assistants who want to maintain a single source of truth.

Anatomy

The following sections form the minimal, effective structure for agents.md. Each section should only exist if it carries content that genuinely cannot live elsewhere.

1. Mission (The Project Context)

A concise description of the project’s purpose and constraints. This differentiates domain context the agent cannot infer from code — a “User” in a banking app (ACID compliance, zero-trust) behaves very differently from a “User” in a casual game (low friction). Keep this to 2–4 sentences.

> **Project:** ZenTask — a minimalist productivity app.  
> **Core constraint:** Local-first data architecture; offline support is non-negotiable.

2. Toolchain Registry

The minimal reference to what non-standard tools are in play and how to invoke them. Do not describe what the tools enforce — that is already in their config files.

IntentCommandNotes
Buildpnpm buildOutputs to dist/
Testpnpm test:unitFlags: —watch=false
Lintpnpm lint --fixBiome — see biome.json
Type checkpnpm typechecktsconfig.json is the authority

3. Judgment Boundaries

The behavioral rules that cannot be expressed by a tool or through a skill — the steering constraints that shape how the agent reasons, not what the linter catches. Use the three-tier system:

NEVER (Hard judgment limits):

ASK (Human-in-the-loop triggers):

ALWAYS (Proactive judgment):

Note: If a rule here overlaps with something your toolchain or harness enforces (e.g., skill, linting rules, type errors), remove it from agents.md. The tool is the enforcement mechanism, not the agent.

4. Available Personas (Registry Only)

If your project uses multiple agent personas, list them by name and invocation. Full persona definitions live in skill/workflow files, not inline here. Loading all persona definitions on every session is wasteful when only one is active at a time.

## Personas
Invoke via skill: @Lead, @Dev, @Designer, @Critic  
Definitions: `.claude/skills/`

For single-persona projects, a brief identity statement is sufficient:

## Identity
Senior Systems Engineer — Go 1.22, gRPC, high-throughput concurrency.
Favor explicit error handling and composition over inheritance.

5. Context Map

Use inly when the project structure is complex or the agents constantly stumbles finding files

A structural index of the codebase for architectural orientation. This is most valuable for onboarding new sessions, spec writing, error triage, and ADR authoring — not primarily as a file-navigation aid for delivery tasks. Keep it high-level; agents can discover file-level details themselves.

Scope of value: Gloaguen et al. (2026) found that directory maps in context files do not meaningfully accelerate file discovery during delivery tasks — agents navigate repositories effectively without them. The Context Map’s value is in the broader SDLC: architectural orientation for new sessions, spec writing, error triage, and ADR authoring. It is an orientation tool, not a navigation shortcut for implementation agents. Do not use it as a substitute for good repo structure that speaks for itself.

See Context Mapping for implementation guidance.

What to Audit Out

Periodically review agents.md for content that has migrated to the toolchain. Common offenders:

On LLM-Generated Context Files

Most agent tools offer a /init or equivalent command that auto-generates an agents.md. Treat this as an example of everything that might not need to be in the constitution. This is the context the agent was able to find out independently.

Gloaguen et al. (2026) found LLM-generated context files consistently reduce agent performance and inflate cost. The mechanism: agents follow the generated instructions faithfully, which broadens exploration and increases reasoning cost without improving task outcomes. The generated file is a useful inventory — use it to identify what might belong in agents.md, then apply the Toolchain First principle to strip everything that belongs elsewhere.

Format Philosophy

The structures in this specification are optimized for larger teams and complex codebases. For smaller projects:

The goal is signal density, not format compliance. Overly rigid specs create adoption friction.

Reference Template

Filename: AGENTS.md

# AGENTS.md

> **Project:** High-throughput gRPC service for real-time financial transactions.  
> **Core constraints:** Zero-trust security model, ACID compliance on all writes.

## Toolchain
| Action | Command | Authority |
|---|---|---|
| Build | `make build` | Outputs to `./bin` |
| Test | `make test` | Runs with `-race` detector |
| Lint | `golangci-lint run` | See `.golangci.yml` |
| Proto | `make proto` | Regenerates gRPC stubs |

## Judgment Boundaries
**NEVER**
- Commit secrets, tokens, or `.env` files
- Add external dependencies without discussion
- Use `_` to ignore errors

**ASK**
- Before adding external dependencies
- Before running database migrations

**ALWAYS**
- Explain your plan before writing code
- Run `buf lint` after modifying any `.proto` file

## Personas
Invoke via skill: @Lead, @Dev, @Critic  
Definitions: `.claude/skills/`

## Context Map

Map out the project structure. Omit platform-, framework-, tooling-, library-, and framework-specific defaults the Agent can infer from the repository tooling and configuration.

```yaml
monorepo: pnpm workspaces

packages:
  apps/web: Next.js frontend
  apps/api: Express REST API, used by the apps/web and an external mobile app
  packages/ui: shared component library (consumed by web)
  packages/db: Prisma schema, client, migrations — import from here, not direct prisma calls
  packages/types: shared TypeScript types

notable:
  scripts/: repo-wide dev tooling, not shipped
  .env.example: canonical env vars reference, shipped with non-sensitive examples

The key discipline: only list dirs/files that would surprise someone who knows the framework. Standard Next.js folders like src/app are borderline — include them only if your layout deviates from convention.

Micro-Commits

Ultra-granular commit practice for agentic workflows, treating version control as reversible save points.

Status: Live | Last Updated: 2026-01-13

Definition

Micro-Commits is the practice of committing code changes at much higher frequency than traditional development workflows. Each discrete task—often a single function, test, or file—receives its own commit.

When working with LLM-generated code, commits become “save points in a game”: Checkpoints that enable instant rollback when probabilistic outputs introduce bugs or architectural drift.

When to Use

Use this practice when:

Skip this practice when:

The Problem: Coarse-Grained Commits in Agentic Workflows

Traditional commit practices optimize for human readability and PR review: “logical units of work” that span multiple files and implement complete features.

This fails in agentic workflows because:

LLM outputs are probabilistic — A model might generate correct code for 3 files and introduce subtle bugs in the 4th. Bundling all 4 files into one commit makes rollback destructive.

Regression to mediocrity — Without checkpoints, it’s difficult to identify where LLM output drifted from the Spec contracts.

Context loss — Large commits obscure the sequence of decisions. When debugging, you need to know “what changed, when, and why.”

No emergency exit — If an LLM generates a tangled mess across 10 files, your only option is manual surgery or discarding hours of work.

The Solution: Commit After Every Task

Make a commit immediately after:

This creates a breadcrumb trail of working states.

The Practice

4.1. Atomic Tasks → Atomic Commits

Break work into small, testable chunks. Each chunk maps to one commit.

Example PBI: “Add OAuth login flow”

Commit sequence:

1. feat: add OAuth config schema
2. feat: implement token exchange endpoint
3. feat: add session storage for OAuth tokens
4. test: add OAuth flow integration test
5. refactor: extract OAuth error handling

This aligns with atomic PBIs: small, bounded execution units.

4.2. Commit Messages as Execution Log

Commit messages document the sequence of LLM-assisted changes. They serve as:

Format:

type(scope): brief description

- Detail 1
- Detail 2

Example:

feat(auth): implement OAuth token validation

- Add JWT verification middleware
- Extract claims from token payload
- Return 401 on expired tokens

4.3. Branches and Worktrees for Isolation

Use branches or git worktrees to isolate LLM experiments:

Branches — Separate experimental work from stable code. Merge only after validation.

Worktrees — Run parallel LLM sessions on the same repository without context conflicts. Each worktree is an independent working directory.

Example workflow:

# Create worktree for LLM experiment
git worktree add ../project-experiment experiment-oauth

# Work in worktree, commit frequently
cd ../project-experiment
# ... LLM generates code ...
git commit -m "feat: add OAuth callback handler"

# If successful, merge into main
git checkout main
git merge experiment-oauth

# If failed, discard worktree
git worktree remove ../project-experiment

This prevents contaminating the main branch with failed LLM output.

4.4. Rollback as First-Class Operation

When LLM output introduces bugs:

Identify the bad commit — Review recent history to find where issues appeared.

Rollback to last known good state:

# Soft reset (keeps changes as uncommitted)
git reset --soft HEAD~1

# Hard reset (discards changes entirely)
git reset --hard HEAD~1

Selective revert:

# Revert specific commit without losing subsequent work
git revert <commit-hash>

This is only safe because micro-commits isolate changes.

Claude Code Checkpoints and /rewind

Claude Code’s built-in checkpoint system complements micro-commits with session-level rollback. Before each file edit, Claude Code automatically snapshots the code state. The /rewind command (or double-tap Esc) opens a menu showing each prompt from the session, with three restore options:

This is especially useful when an LLM takes a wrong architectural turn — you can rewind the conversation context (preventing the model from reinforcing its own mistakes) while selectively keeping or discarding code changes.

[!WARNING] Key limitation: Checkpoints only track edits made through Claude’s file editing tools. Changes made via bash commands (rm, mv, cp) are not tracked. This is why micro-commits to Git remain essential — checkpoints are “local undo,” Git is “permanent history.”

See: Checkpointing — Claude Code Docs

5. Tidy History for Comprehension

Granular commits create noisy history. Before merging to main, optionally squash related commits into logical units:

# Interactive rebase to squash last 5 commits
git rebase -i HEAD~5

This preserves detailed history during development while creating clean history for long-term maintenance.

Trade-off: Squashing removes granular rollback points. Only squash after validation passes Quality Gates.

Relationship to The PBI

PBIs define what to build. Micro-Commits define how to track progress.

Atomic PBIs (small, bounded tasks) naturally produce micro-commits. Each PBI generates 1-5 commits depending on complexity.

Example mapping:

This makes PBI progress traceable and reversible.

See also:

PBI Authoring

How to write Product Backlog Items that agents can read, execute, and verify—with templates and lifecycle guidance.

Status: Live | Last Updated: 2026-01-13

Definition

PBI Authoring is the practice of writing Product Backlog Items optimized for agent execution. This includes structuring the four-part anatomy, ensuring machine accessibility, and managing the PBI lifecycle from planning through closure.

Following this practice produces PBIs that agents can programmatically access, unambiguously interpret, and verifiably complete.

When to Use

Use this practice when:

Skip this practice when:

Process

Step 1: Ensure Accessibility

Invisibility is a bug. If an agent cannot read the PBI, the workflow is broken.

A PBI locked inside a web UI without API or MCP integration is useless to an AI developer. The agent must programmatically access the work item without requiring human copy-paste.

Valid access methods:

MethodDescription
MCP IntegrationAgent connected to Issue Tracker (Linear, Jira, GitHub) via MCP
Repo-BasedPBI exists as a markdown file (e.g., tasks/PBI-123.md)
API AccessTracker exposes REST/GraphQL API the agent can query

If your tracker has no API access: Mirror PBIs as markdown files during sprint planning, or implement MCP integration.

Step 2: Write the Directive

State what to do with explicit scope boundaries. Be imperative, not conversational.

Good:

Implement the API Layer for user notification preferences.
Scope: Only touch the `src/api/notifications` folder.

Bad:

As a user, I want to manage my notification preferences so that I can control what emails I receive.

The second example requires interpretation. The first is executable.

[!TIP] Prompt for the Plan. Even if your tool handles planning automatically, explicitly instruct the agent to output its plan for review. This forces the Specify → Plan → Execute loop.

Example Directive: “Analyze the Spec, propose a step-by-step plan including which files you will touch, and wait for my approval before editing files.”

Step 3: Add Context Pointers

Reference the permanent spec—don’t copy design decisions into the PBI.

Reference: `plans/notifications/spec.md` Part A for the schema.
See the "Architecture" section for endpoint contracts.

Why pointers, not copies: Specs evolve. A copied schema in a PBI becomes stale the moment the spec updates. Pointers ensure the agent always reads the authoritative source.

Step 4: Define Verification Criteria

Link to success criteria in the spec, or define inline checkboxes.

Must pass "Scenario 3: Preference Update" defined in 
`plans/notifications/spec.md` Part B (Contract).

Or inline:

- [ ] POST /preferences returns 201 on valid input
- [ ] Invalid payload returns 400 with error schema
- [ ] Unit test coverage > 80%

Step 5: Declare Dependencies

Explicitly state what blocks this PBI and what it blocks.

## Dependencies
- Blocked by: PBI-101 (creates the base schema)
- Must merge before: PBI-103 (extends this endpoint)

Anti-Pattern: Implicit dependencies discovered at merge time. Identify during planning; either sequence the work or refactor into independent units.

Step 6: Set the Refinement Rule

Define what happens when reality diverges from the spec.

If implementation requires changing the Architecture, you MUST 
update `spec.md` in the same PR with a changelog entry.

Options to specify:

Template

# PBI-XXX: [Brief Imperative Title]

## Directive
[What to build/change in 1-2 sentences]

**Scope:**
- [Explicit file/folder boundaries]
- [What NOT to touch]

## Dependencies
- Blocked by: [PBI-YYY if any, or "None"]
- Must merge before: [PBI-ZZZ if any, or "None"]

## Context
Read: `[path/to/spec.md]`
- [Specific section to reference]

## Verification
- [ ] [Criterion 1: Functional requirement]
- [ ] [Criterion 2: Performance/quality requirement]
- [ ] [Criterion 3: Test coverage requirement]

## Refinement Protocol
[What to do if the spec needs updating during implementation]

PBI Lifecycle

PhaseActorAction
PlanningHumanCreates PBI with 4-part structure
AssignmentHuman/SystemPBI assigned to Agent or Developer
ExecutionAgentReads Spec, implements Delta
ReviewHumanVerifies against Spec’s Contract section
MergeHuman/SystemCode merged, Spec updated if needed
ClosureSystemPBI archived, linked to commit/PR

Common Mistakes

The User Story Hangover

Problem: PBI written as “As a user, I want…” with no explicit scope or verification.

Solution: Rewrite in imperative form with scope boundaries and checkable criteria.

The Spec Copy

Problem: PBI contains copied design decisions that drift from the spec.

Solution: Use pointers to spec sections, never copy content that lives elsewhere.

The Hidden Dependency

Problem: Two PBIs touch the same files; discovered at merge time.

Solution: During planning, map file ownership. If overlap exists, sequence the PBIs or refactor scope.

The Untestable Increment

Problem: PBI verification requires another PBI to complete first.

Solution: Ensure each PBI is self-testable. If not possible, merge into a single PBI or create test fixtures.

This practice implements:

See also:

Adversarial Requirement Review

A verification practice where a Critic Agent challenges the problem statement and assumptions before any specification or code is written.

Status: Experimental | Last Updated: 2026-02-12

Adversarial Requirement Review

Definition

Adversarial Requirement Review is a verification practice where a Thought Partner agent (acting as an adversarial critic) challenges the problem statement, underlying assumptions, and strategy before any specification is written or implementation begins.

This shifts the “adversarial” concept left—from reviewing code (Adversarial Code Review) to reviewing the intent itself.

When to Use

Use this practice when:

Skip this practice when:

The Problem: The Backwards Approach

In traditional development, and accelerated by AI, we often start with “How do we build X?” rather than “Is X the right problem to solve?”.

The Backwards Problem:

  1. Stakeholder has an idea (“We need a weekly email report”).
  2. Engineer/AI jumps to implementation (“I’ll set up a cron job…”).
  3. Feature ships quickly.
  4. Feature fails because the underlying problem was misunderstood (e.g., users needed real-time data, not weekly snapshots).

AI exacerbates this by making implementation so cheap that we skip validation. We build the wrong thing faster than ever.

The Solution: Thought Partner vs. Leader

To break this cycle, we separate the roles:

The goal is not for the AI to solve the problem, but to sharpen the problem definition.

The Workflow

This pattern consists of three distinct phases of challenge.

1. The Problem Sharpener

Goal: Clarify the problem statement and remove implied solutions.

Prompt Pattern:

“I’m going to describe a problem I’m trying to solve. I want you to act as a Thought Partner - not to solve it, but to help me understand it better.

After I describe the problem, interview me one question at a time to:

  • Clarify who exactly is affected and when
  • Surface barriers I might be glossing over
  • Identify assumptions I’m making without realizing it
  • Challenge whether I’ve framed the problem correctly

Don’t suggest solutions. Help me see the problem more clearly.

Here’s the problem: [describe your problem]“

2. The Assumption Surfacer

Goal: Identify risky beliefs that must be true for the strategy to succeed.

Prompt Pattern:

“I’m considering this product strategy: [describe what you’re building and why].

What assumptions am I making that must be true for this to work?

Focus on:

  • Behavior: Will people actually change their behavior to use it? (Desire != Action)
  • Value: Is it worth building? Does the value justify the cost?
  • Alternatives: What am I deprioritizing, and what is the cost of leaving that unsolved?

List 5-7 assumptions, starting with the ones most likely to be wrong.”

3. The Pre-Build Stress Test

Goal: Final pressure test before committing to a Spec or PBI.

Prompt Pattern:

“Before I commit to building this, I want to pressure-test the idea.

Context: [describe what you’re planning to build and the problem it solves]

Act as a skeptical but constructive advisor. Interview me one question at a time to find weaknesses in my thinking. Push back where my reasoning seems thin. Help me discover what I don’t know before I invest in building.”

Integration with ASDLC

This practice operates at the beginning of the second diamond (The Solution Space), acting as the bridge between “Insight” and “Specification”.

It ensures that we don’t proceed to Spec-Driven Development with a flawed premise.

Output: The output of this review is a validated Problem Statement and Strategy, which then becomes the “Context” section of your Spec.

References

  1. Before I Ask AI to Build, I Ask It to Challenge Author: Daniel Donbavand Published: 2026-02-12 URL: https://danieldonbavand.com/2026/02/12/before-i-ask-ai-to-build-i-ask-it-to-challenge/ Source of the “Problem Sharpener,” “Assumption Surfacer,” and “Pre-Build Stress Test” prompts.

Constitutional Review Implementation

Step-by-step guide for implementing Constitutional Review to validate code against both Spec and Constitution contracts.

Status: Experimental | Last Updated: 2026-01-08

Definition

Constitutional Review Implementation is the operational practice of configuring and executing Constitutional Review to validate code against both functional requirements (the Spec) and architectural values (the Constitution).

This practice extends Adversarial Code Review by adding constitutional constraints to the Critic Agent’s validation criteria.

When to Use

Use this practice when:

Skip this practice when:

Prerequisites

Before implementing Constitutional Review, ensure you have:

  1. Agent Constitution documented (typically AGENTS.md)
  2. The Spec for the feature being reviewed
  3. Critic Agent session separate from the Builder Agent (fresh context)
  4. Architectural constraints clearly defined in the Constitution

If architectural constraints aren’t documented, start with AGENTS.md Specification.

Process

Step 1: Document Architectural Constraints in Constitution

Ensure your Agent Constitution includes non-functional constraints that are:

Example Structure:

## Architectural Constraints

### Data Access
- All filtering operations MUST be pushed to the database layer
- Never use `findAll()` or `LoadAll()` followed by in-memory filtering
- Queries must handle 10k+ records without memory issues

### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- No N+1 query patterns

### Security
- User IDs never logged (use hashed identifiers)
- All inputs validated against Zod schemas before processing
- Authentication tokens expire within 24 hours
- No hardcoded secrets (use environment variables)

### Error Handling
- Never fail silently (all errors logged with context)
- User-facing errors never expose stack traces
- Database errors map to generic "Service unavailable" messages

Step 2: Configure Critic Agent Prompt

Extend the standard Adversarial Code Review prompt to include constitutional validation.

System Prompt Template:

You are a rigorous Code Reviewer validating implementation against TWO sources of truth:

1. The Spec (/plans/{feature-name}/spec.md)
   - Functional requirements (what should it do?)
   - API contracts (what are the inputs/outputs?)
   - Data schemas (what is the structure?)

2. The Constitution (AGENTS.md)
   - Architectural patterns (e.g., "push filtering to DB")
   - Performance constraints (e.g., "queries handle 10k+ records")
   - Security rules (e.g., "never log user IDs")
   - Error handling policies (e.g., "never fail silently")

YOUR JOB:
Identify where code satisfies the Spec (functional) but violates the Constitution (architectural).

COMMON CONSTITUTIONAL VIOLATIONS TO CHECK:
- LoadAll().Filter() pattern (data access violation)
- Hardcoded secrets (security violation)
- Missing error logging (error handling violation)
- N+1 query patterns (performance violation)
- User IDs in logs (security violation)

OUTPUT FORMAT:
For each violation:
1. Type: Constitutional Violation - [Category]
2. Location: File path and line number
3. Issue: What constitutional principle is violated
4. Impact: Why this matters at scale (performance, security, maintainability)
5. Remediation Path: Ordered steps to fix (prefer standard patterns, escalate if needed)
6. Test Requirements: What tests would prevent regression

If no violations found, output: PASS - Constitutional Review

Step 3: Execute Constitutional Review Workflow

Follow this sequence to ensure proper validation:

┌─────────────┐
│   Builder   │ → Implements Spec
└──────┬──────┘
       ↓
┌─────────────────┐
│  Quality Gates  │ → Tests, types, linting (deterministic)
└──────┬──────────┘
       ↓ (pass)
┌──────────────────┐
│ Spec Compliance  │ → Does it meet functional requirements?
│     Review       │    (Adversarial Code Review)
└──────┬───────────┘
       ↓ (pass)
┌──────────────────┐
│ Constitutional   │ → Does it follow architectural principles?
│     Review       │    (This practice)
└──────┬───────────┘
       ↓ (pass)
┌─────────────────┐
│ Acceptance Gate │ → Human strategic review (is it the right thing?)
└─────────────────┘

Execution Steps:

  1. Builder completes implementation — Code written, tests pass
  2. Quality Gates pass — Compilation, linting, unit tests all green
  3. Spec Compliance Review — Critic validates functional requirements met
  4. ⭐ Constitutional Review — Critic validates architectural principles followed:
    • Open new Critic Agent session (fresh context, no Builder bias)
    • Provide Constitution (AGENTS.md)
    • Provide Spec (feature spec file)
    • Provide Code Diff (changed files only)
    • Use Constitutional Review prompt (from Step 2)
    • Critic outputs violations or PASS
  5. If violations found → Return to Builder with remediation path
  6. If PASS → Proceed to Acceptance Gate (human review)

Step 4: Process Violation Reports

When the Critic identifies constitutional violations, the output will follow this format:

VIOLATION: Constitutional - Data Access Pattern

Location: src/audit/AuditService.cs Line 23

Issue: Loads all records into memory before filtering
Constitution Violation: "All filtering operations MUST be pushed to database layer"

Impact: 
- Works fine with small datasets (< 1k records)
- Breaks at scale (10k+ records cause memory issues)
- Creates N+1 query patterns in related queries
- Violates performance SLA (API responses > 200ms)

Remediation Path:
1. Push filter to database query:
   repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this pattern, use raw SQL:
   SELECT * FROM audit_logs WHERE date > @startDate
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in repository interface comments

Test Requirements:
- Add test: "GetLogs with 10k records completes in < 200ms"
- Add test: "GetLogs does not load entire table into memory"
  (mock repository, verify FindWhere called, not LoadAll)

Processing steps:

  1. Return to Builder Agent with full violation report
  2. Builder implements remediation following the ordered path
  3. Re-run Constitutional Review after fixes
  4. Iterate until PASS (typically 1-2 cycles)

Step 5: Update Constitution Based on Violations

If the Critic struggles to validate or produces unclear violations:

Indicators Constitution needs update:

Action: Refine constitutional constraints:

## Before (vague)
### Performance
- Code should be fast

## After (specific)
### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- Push filtering to database layer (never LoadAll().Filter())

Examples

Example 1: Data Access Violation

Code Submitted:

async function getActiveUsers() {
  const users = await db.users.findAll();
  return users.filter(u => u.status === 'active');
}

Constitutional Review Output:

VIOLATION: Constitutional - Data Access Pattern

Location: src/users/service.ts Line 12

Issue: Loads all users then filters in-memory
Constitution Violation: "Push filtering to database layer"

Impact: Works with 100 users, breaks at 10k+

Remediation:
1. db.users.findWhere({ status: 'active' })
2. Add test with 10k users to verify performance

Test Requirements:
- Performance test: 10k users returns in < 200ms
- Verify db.users.findWhere called (not findAll)

Example 2: Security Violation

Code Submitted:

logger.info(`User ${userId} logged in from ${ipAddress}`);

Constitutional Review Output:

VIOLATION: Constitutional - Security

Location: src/auth/logger.ts Line 45

Issue: Logs user ID directly
Constitution Violation: "Never log user IDs (use hashed identifiers)"

Impact: GDPR compliance risk, audit log exposure

Remediation:
1. Hash user ID: logger.info(`User ${hashUserId(userId)} logged in...`)
2. Implement hashUserId utility (SHA-256 with salt)
3. Update all logging to use hashed IDs

Test Requirements:
- Verify logs do not contain raw user IDs
- Verify hashed IDs are consistent (same user = same hash)

Implementation Constraints

Requires Clear Constitutional Principles — Vague constraints produce vague critiques. “Be performant” is not actionable. “API responses < 200ms at p99” is.

Not Fully Automated (Yet) — As of January 2026, requires manual orchestration. You must manually:

Model Capability Variance — Not all reasoning models perform equally at constitutional review. Recommended:

False Positives Possible — Architectural rules have exceptions. The Critic may flag valid code that violates general principles for good reasons. Human review in Acceptance Gate remains essential.

Context Window Limits — Large diffs may exceed context windows. Solutions:

Troubleshooting

Issue: Critic approves code that violates Constitution

Cause: Constitutional constraints not specific enough in AGENTS.md

Solution:

  1. Review violation that slipped through
  2. Add specific constraint to Constitution:
    ### Data Access
    - ❌ Before: "Queries should be efficient"
    - ✅ After: "Never use LoadAll().Filter() - push filtering to database"
    
  3. Re-run Constitutional Review with updated Constitution

Issue: Critic flags valid code as violation

Cause: Constitutional rule is too strict or lacks exceptions

Solution:

  1. Document exception in Constitution:
    ### Data Access
    - Push filtering to database layer
    - Exception: In-memory filtering allowed for cached reference data (< 100 records)
    
  2. Update Critic prompt to recognize exceptions
  3. Proceed to Acceptance Gate (human validates exception is legitimate)

Issue: Constitutional Review takes too long

Cause: Large code diffs or complex Constitution

Solution:

  1. Break up PRs — Smaller, focused changes
  2. Parallelize reviews — Review multiple files concurrently
  3. Use Summary Gates — Compress Spec to relevant sections only
  4. Cache Constitution — Reuse constitutional context across reviews

Future Automation Potential

This practice is currently manual but has clear automation paths:

CI/CD Integration — Automated constitutional review on PR creation:

# .github/workflows/constitutional-review.yml
on: pull_request
jobs:
  constitutional-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run Constitutional Review
        run: |
          constitutional-review-agent \
            --constitution AGENTS.md \
            --spec plans/${FEATURE}/spec.md \
            --diff ${{ github.event.pull_request.diff_url }}

IDE Integration — Real-time constitutional feedback:

Living Constitution — Automatic updates:

Violation Analytics — Dashboard tracking:

See also:

External Validation

Context Mapping

The practice of creating high-density Context Maps to guide agents through codebases and documentation.

Status: Experimental | Last Updated: 2026-02-16

Definition

Context Mapping is the tactical practice of generating and maintaining the Context Map within your project’s AGENTS.md. It involves auditing your knowledge assets and creating a compressed index that allows agents to navigate them autonomously.

When to Use

The Strategy: “Hybrid Mapping”

We use a Hybrid Strategy based on the scale of independent variables:

Knowledge TypeScalePreferred FormatWhy?
Project Structure< 1000 filesAnnotated YAMLReadable, Structural, Native to LLMs.
Internal Docs< 100 filesAnnotated YAMLExplains intent alongside path.
External Frameworks> 10MB textCompressed PipeMaximum token density (80% compression).

Rule of Thumb

“If you can read it, the Agent can read it.” Default to YAML. Only use Compressed Pipe syntax when you hit token limits with massive external indexes.

Implementation: The YAML Standard

For 99% of use cases, use an Annotated YAML map in your AGENTS.md.

1. Mapping Code (Topology)

Do not list every file. List Responsibilities.

project_structure:
  src:
    features:
      checkout: "Payment flow and cart logic. STRICT: No direct DB access."
      inventory: "Stock checking and reservation logic."
    shared:
      components: "Re-usable UI atoms. MUST use Tailwind."
      lib: "Stateless utilities."

2. Mapping Internal Docs (Knowledge)

Map your docs/ folder to explain what questions each document answers.

documentation_index:
  docs:
    arch:
      infra.md: "READ THIS for Terraform state policies and AWS setup."
      decisions.md: "ADR log. Explains why we chose gRPC over REST."
    api:
      contracts.md: "The single source of truth for API schemas."

Implementation: The Compressed Standard (Advanced)

Use this only for massive indices, such as dumping the entire Next.js or Supabase documentation structure into context.

Format: [Name]|root:path|Instruction|path:{file1,file2}

[Next.js Docs]|root: ./.next-docs|IMPORTANT: Prefer retrieval-led reasoning.
|01-app/01-getting-started:{01-installation.mdx,02-project-structure.mdx}
|01-app/02-building-your-application:{01-routing.mdx,02-rendering.mdx}

Process

  1. Audit: Identify the “Hidden Knowledge” (Architecture docs, specific files) agents miss.
  2. Select Format: Default to YAML. Switch to Pipe only if > 2000 tokens.
  3. Embed: Place the map in Section 5 of your AGENTS.md.
  4. Verify: Ask the agent a question that requires the map (e.g., “Where is the Terraform state policy?”). If it reads the map and then finds the file, you succeeded.

Context Offloading

The practice of moving agent trajectories, tool results, and historical state from the context window to the filesystem to prevent cognitive overload and context rot.

Status: Experimental | Last Updated: 2026-02-21

Definition

Context Offloading is the operational process of moving agent trajectories, intermediate tool results, and session history from the active LLM context window to persistent storage (usually the filesystem).

Instead of relying on an ever-growing linear chat history that inevitably degrades model reasoning (Context Rot), this practice treats the active context as a limited working memory, using the filesystem as long-term storage that the agent can retrieve from only when necessary.

When to Use

Use this practice when:

Skip this practice when:

Process

Step 1: Establish the State Directory

Create a dedicated location for offloaded context, isolated from the project source code.

mkdir -p .agents/state/trajectories

Step 2: Implement Truncation Thresholds

Configure the agent harness to monitor token usage. When the context window approaches a safety threshold (e.g., 80% capacity), trigger a truncation event.

Step 3: Offload and Summarize

When a truncation event is triggered:

  1. Write the raw intermediate results (e.g., full compiler <stdout>) to a file in the state directory.
  2. Replace the massive raw result in the active context window with a pointer and a highly compressed summary.

Example Context Replacement:

[Tool Execution: `pnpm check`]
*Result offloaded to: `.agents/state/logs/typecheck-142.log`*
Summary: 14 type errors found. Primary cluster in `src/components/Form.tsx` related to missing `Zod` inference. 

Step 4: Provide Retrieval Tools

The agent must be able to read the offloaded context back into working memory if needed. Provide explicit filesystem access tools (e.g., view_file) or implement progressive disclosure mechanics that allow the agent to fetch the raw logs only when actively debugging them.

Common Mistakes

Over-Summarization (Lossy Compression)

Problem: Summarizing the offloaded context too aggressively, losing critical edge cases that the agent needs later.

Solution: Always preserve the raw data in the file system before summarizing. The summary is an index; the file is the source of truth.

Offloading to Conversation History

Problem: Attempting to offload by passing the context to a “memory API” or relying entirely on an extended context window (e.g., 2M tokens) without structural filtering.

Solution: The filesystem is the only deterministically queryable, grep-able, and auditable storage layer for code agents. Long context windows do not negate the need for structured state; they merely delay the onset of context rot. Build file-centric state management mechanisms instead of assuming the model will remember everything accurately.

This practice implements:

See also:

Living Specs

Practical guide to creating and maintaining specs that evolve alongside your codebase.

Status: Experimental | Last Updated: 2025-12-22

Overview

This guide provides practical instructions for implementing the Specs pattern as a spec-anchored methodology.

In the spectrum of Spec-Driven Development, ASDLC explicitly targets the spec-anchored maturity level—where specs remain the architectural source of truth for a feature’s lifecycle, but determinisim is preserved by retaining code. While the pattern describes what specs are and why they matter, this guide focuses on how to create and maintain them.

When to Create a Spec

Create a spec when starting work that involves:

Feature Domains — New functionality that introduces architectural patterns, API contracts, or data models that other parts of the system depend on.

User-Facing Workflows — Features with defined user journeys and acceptance criteria that need preservation for future reference.

Cross-Team Dependencies — Any feature that other teams will integrate with, requiring clear contract definitions.

Don’t create specs for: Simple bug fixes, trivial UI changes, configuration updates, or dependency bumps.

Spec granularity

A spec should be detailed enough to serve as a contract for the feature, but not so detailed that it becomes a maintenance burden.

Some spec features, like gherkin scenarios, are not always necessary if the feature is simple or well-understood.

When to Update a Spec

Update an existing spec when:

Golden Rule: If code behavior changes, the spec MUST be updated in the same commit.

File Structure

Organize specs by feature domain, not by sprint or ticket number.

/project-root
├── ARCHITECTURE.md           # Global system rules
├── plans/                    # Feature-level specs
│   ├── user-authentication/
│   │   └── spec.md
│   ├── payment-processing/
│   │   └── spec.md
│   └── notifications/
│       └── spec.md
└── src/                      # Implementation code

Conventions:

Context Separation: Maintain a strict separation between global constitutional context and feature-level functional specifications:

Maintenance Protocol

Same-Commit Rule

If code changes behavior, update the spec in the same commit. Add “Spec updated” to your PR checklist.

git commit -m "feat(notifications): add SMS fallback

- Implements SMS delivery when WebSocket fails
- Updates /plans/notifications/spec.md with new transport layer"

Deprecation Over Deletion

Mark outdated sections as deprecated rather than removing them. This preserves historical context.

### Architecture

**[DEPRECATED 2024-12-01]**
~~WebSocket transport via Socket.io library~~
Replaced by native WebSocket API to reduce bundle size.

**Current:**
Native WebSocket connection via `/api/ws/notifications`

Bidirectional Linking

Link code to specs and specs to code:

// Notification delivery must meet 100ms latency requirement
// See: /plans/notifications/spec.md#contract
### Data Schema
Implemented in `src/types/Notification.ts` using Zod validation.

Template

# Feature: [Feature Name]

## Blueprint

### Context
[Why does this feature exist? What business problem does it solve?]

### Architecture
- **API Contracts:** `[METHOD] /api/v1/[endpoint]` - [Description]
- **Data Models:** See `[file path]`
- **Dependencies:** [What this depends on / what depends on this]

### Anti-Patterns
- [What agents must avoid, with rationale]

## Contract

### Definition of Done
- [ ] [Observable success criterion]

### Regression Guardrails
- [Critical invariant that must never break]

### Scenarios
**Scenario: [Name]**
- Given: [Precondition]
- When: [Action]
- Then: [Expected outcome]

Anti-Patterns

The Stale Spec

Problem: Spec created during planning, never updated as the feature evolves.

Solution: Make spec updates mandatory in Definition of Done. Add PR checklist item.

The Spec in Slack

Problem: Design decisions discussed in chat but never committed to the repo.

Solution: After consensus, immediately update spec.md with a commit linking to the discussion.

The Monolithic Spec

Problem: A single 5000-line spec tries to document the entire application.

Solution: Split into feature-domain specs. Use ARCHITECTURE.md only for global cross-cutting concerns.

The Spec-as-Tutorial

Problem: Spec reads like a beginner’s guide, full of basic programming concepts.

Solution: Assume engineering competence. Document constraints and decisions, not general knowledge.

The Copy-Paste Code

Problem: Spec duplicates large chunks of implementation code.

Solution: Reference canonical sources with file paths. Only include minimal examples to illustrate patterns.

See also:

Workflow as Code

Define agentic workflows in deterministic code rather than prompts to ensure reliability, type safety, and testable orchestration.

Status: Experimental | Last Updated: 2026-02-18

Definition

Workflow as Code is the practice of defining agentic workflows using deterministic programming languages (like TypeScript or Python) rather than natural language prompts.

It treats the Agent as a function call within a larger, strongly-typed system, rather than treating the System as a tool available to a chatty agent.

When to Use

Use this practice when:

Skip this practice when:

Why It Matters

When complex workflows are driven entirely by an LLM loop (“Here is a goal, figure it out”), the system suffers from Context Pollution. As the agent accumulates history—observations, tool outputs, internal monologue—its attention degrades.

Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side”: it loses focus on strict process adherence because its context window is overflowing with implementation details.

Process

Step 1: Identify Deterministic vs Probabilistic Tasks

Audit your workflow. Separate mechanical tasks (running builds, conditional logic, file operations) from intelligence tasks (code review, summarization, decision-making under ambiguity).

Deterministic (Code):

Probabilistic (Agent):

Step 2: Define Typed Step Abstraction

Create a common interface for workflow steps:

export type WorkflowContext = {
  workDir: string;
  spec: string;
  history: StepResult[];
};

export type StepResult =
  | { type: 'success'; data: unknown }
  | { type: 'failure'; reason: string; recoverable: boolean };

export type Step = (ctx: WorkflowContext) => Promise<StepResult>;

This enables:

Step 3: Implement the Orchestration Shell

Write the control flow in code. The LLM only appears where intelligence is required:

async function runDevWorkflow(ctx: WorkflowContext) {
  // Deterministic: Run build
  const buildResult = await runBuild(ctx);
  if (buildResult.type === 'failure') {
    return handleBuildError(buildResult);
  }

  // Probabilistic: Agent reviews the diff
  const reviewResult = await runAgentReview({
    diff: await git.getDiff(),
    spec: ctx.spec
  });

  // Deterministic: Act on structured result
  if (reviewResult.verdict === 'PASS') {
    await git.commit();
    await github.createPR();
  }
}

Step 4: Implement Opaque Commands

From the agent’s perspective, workflow steps should be “Black Boxes.” The agent invokes a high-level command and acts on the structured result—it doesn’t need to know implementation details.

Define the interface:

type VerifyWorkResult = {
  status: 'passed' | 'failed';
  errors?: { file: string; line: number; message: string }[];
};

async function verifyWork(ctx: WorkflowContext): Promise<VerifyWorkResult> {
  // Implementation hidden from agent
  const lint = await runLint(ctx.workDir);
  const types = await runTypeCheck(ctx.workDir);
  const tests = await runTests(ctx.workDir);
  
  return aggregateResults([lint, types, tests]);
}

This reduces token usage and prevents the agent from hallucinating incorrect shell commands.

Step 5: Add Enforcement Hooks

Agents will sometimes try to bypass the workflow. Implement hard boundaries using client-side hooks:

# .claude/hooks/pre-tool-use.sh
if [[ "$TOOL" == "Bash" && "$COMMAND" =~ "git push" ]]; then
  echo "Blocked: Use 'submit-pr' tool which runs verification first."
  exit 1
fi

This shifts enforcement from “Instructions in the System Prompt” (which can be ignored) to “Code in the Environment” (which cannot).

Template

Minimal workflow orchestrator structure:

// workflows/dev-workflow.ts
import type { Step, WorkflowContext, StepResult } from './types';

const steps: Step[] = [
  runBuild,
  runLint,
  runAgentReview,  // Only probabilistic step
  commitChanges,
  createPR,
];

export async function execute(ctx: WorkflowContext): Promise<StepResult> {
  for (const step of steps) {
    const result = await step(ctx);
    if (result.type === 'failure' && !result.recoverable) {
      return result;
    }
    ctx.history.push(result);
  }
  return { type: 'success', data: ctx.history };
}

Workflows as Persona Carriers

Persona Injection via Workflow

Workflows are the natural home for session-scoped persona injection. Rather than loading all persona definitions into agents.md on every session, define the persona as part of the workflow context — it gets injected precisely when needed and is absent when it isn’t.

A code review workflow injects the Critic persona. An implementation workflow injects the Dev persona. A spec workflow injects the Lead persona. This is more precise than always-on loading, and avoids the cost of agents following instructions that are irrelevant to the current task.

Example: Review workflow with Critic persona

# .claude/workflows/review.yaml
name: Constitutional Review
trigger: "@review"
context:
  - .claude/skills/critic.md      # Critic persona — injected here, not in agents.md
  - docs/backlog/current-pbi.md   # The spec being reviewed
  - AGENTS.md                     # Project-level judgment boundaries
steps:
  - validate_against_spec
  - constitutional_review
  - produce_report

Example: Implementation workflow with Dev persona

# .claude/workflows/implement.yaml
name: Implementation
trigger: "@implement"
context:
  - .claude/skills/dev.md         # Dev persona — only loaded for implementation tasks
  - docs/backlog/current-pbi.md   # The PBI being implemented
  - AGENTS.md                     # Project-level judgment boundaries
steps:
  - review_pbi
  - plan
  - implement
  - run_tests
  - update_pbi_status

The key property: AGENTS.md contains only project-level judgment. The persona is carried by the workflow and injected at invocation. This keeps agents.md stable and minimal, while delivering the right behavioral context for each task type.

Common Mistakes

The God Prompt

Problem: Entire workflow described in a single system prompt, expecting the agent to “figure it out.”

Solution: Extract deterministic logic into code. The agent should only handle tasks requiring intelligence.

Leaky Abstractions

Problem: Agent sees raw command output (500 lines of test failures) instead of structured results.

Solution: Parse outputs into typed results before passing to the agent. Summarize, don’t dump.

Missing Enforcement

Problem: Workflow relies on the agent “following instructions” without hard boundaries.

Solution: Implement hooks that block unauthorized actions. Trust code, not compliance.

Over-Agentification

Problem: Using an LLM to run npm install or parse JSON—tasks with zero ambiguity.

Solution: Reserve agent calls for genuinely probabilistic tasks. Everything else is code.

Appendix: Concepts

Agentic SDLC

Framework for industrializing software development where agents serve as the logistic layer while humans design, govern, and optimize the flow.

Status: Live | Last Updated: 2026-01-01

Definition

The Agentic Software Development Life Cycle (ASDLC) is a framework for industrializing software engineering. It represents the shift from craft-based development (individual artisans, manual tooling, implicit knowledge) to industrial-scale production (standardized processes, agent orchestration, deterministic protocols).

“Agentic architecture is the conveyor belt for knowledge work.” — Ville Takanen

ASDLC is not about “AI coding assistants” that make developers 10% faster. It’s about building the software factory—systems where agents serve as the architecture of labor while humans design, govern, and optimize the flow.

The Industrial Thesis

Agents do not replace humans; they industrialize execution.

Just as robotic arms automate welding without replacing manufacturing expertise, agents automate high-friction parts of knowledge work (logistics, syntax, verification) while humans focus on intent, architecture, and governance.

In this model:

The Cybernetic Model

ASDLC operates at L3 Conditional Autonomy—a “Fighter Jet” model where the Agent acts as the Pilot executing maneuvers, and the Human acts as the Instructor-in-the-Cockpit.

Key Insight: Compute is cheap, but novelty and correctness are expensive. Agents naturally drift toward the “average” solution (Regression to the Mean). The Instructor’s role is not to write code, but to define failure boundaries (Determinism) and inject strategic intent (Steering) that guides agents out of mediocrity.

The Cybernetic Loop

The lifecycle replaces the linear CI/CD pipeline with a high-frequency feedback loop:

Mission Definition: The Instructor defines the “Objective Packet” (Intent + Constraints). This is the core of Context Engineering.

Generation (The Maneuver): The Agent autonomously maps context—often using the Model Context Protocol (MCP) to fetch live data—and executes the task.

Verification (The Sim): Automated Gates check for technical correctness (deterministic), while the Agent’s Constitution steers semantic intent (probabilistic).

Course Correction (HITL): The Instructor intervenes on technically correct but “generic” solutions to enforce architectural novelty.

Strategic Pillars

Factory Architecture (Orchestration)

Projects structured with agents as connective tissue, moving from monolithic context windows to discrete, specialized stations (Planning, Spec-Definition, Implementation, Review).

Standardized Parts (Determinism)

Schema-First Development where agents fulfill contracts, not guesses. AGENTS.md, specs/, and strict linting serve as the “jigs” and “molds” that constrain agent output.

Quality Control (Governance)

Automated, rigorous inspection through Probabilistic Unit Tests and Human-in-the-Loop (HITL) gates. Trust the process, not just the output.

The Agent Factory (Meta-Optimization)

The underlying machinery that builds the agents themselves. While the Ralph Loop produces code, the Agent Optimization Loop produces better agents by testing them against Scenarios rather than static benchmarks.

ASDLC Usage

Full project vision: /docs/vision.md

Applied in: Specs, AGENTS.md Specification, Context Gates, Model Routing, Agent Optimization Loop

Architecture Decision Record

A lightweight document that captures a significant architectural decision, its context, and consequences at a specific point in time.

Status: Live | Last Updated: 2026-01-28

Definition

An Architecture Decision Record (ADR) is a document that captures a significant architectural decision along with its context, rationale, and consequences. Unlike living documentation that evolves with the codebase, ADRs are immutable snapshots—they record what was decided and why at a specific moment in time.

The format was introduced by Michael Nygard in 2011 as a lightweight alternative to heavyweight architecture documentation. Each ADR addresses exactly one decision, making the record atomic and traceable.

Key Characteristics

Immutability

ADRs are not updated when circumstances change. Instead, a new ADR is created that supersedes the original. This preserves the archaeological record of how architectural thinking evolved.

Lightweight

A single ADR fits on one or two pages. The format resists the temptation to over-document, focusing only on the decision and its immediate context.

Decision-Focused

ADRs capture decisions, not designs or implementations. The question answered is “What did we decide?” not “How does it work?” (that belongs in specs) or “How do I build it?” (that belongs in implementation guides).

Contextual

Every ADR includes the forces and constraints that shaped the decision. This context is critical—a decision that seems wrong in isolation often makes sense when the original constraints are understood.

Standard Sections

The canonical ADR format includes:

SectionPurpose
TitleShort name with ID (e.g., “ADR-001: Use PostgreSQL for Primary Database”)
StatusLifecycle state: Proposed, Accepted, Deprecated, Superseded by ADR-XXX
ContextWhat forces are at play? What problem needs solving?
DecisionWhat was decided?
ConsequencesPositive, negative, and neutral outcomes of this decision
Alternatives ConsideredWhat other options were evaluated and why they were rejected?

ASDLC Usage

In ASDLC, ADRs serve as high-value context for agents. When an agent works on authentication, knowing “ADR-003: Chose Supabase Auth over Firebase Auth” provides essential architectural constraints.

ADRs may also evolve into Agent Constitution rules—an ADR stating “All database migrations must be backward-compatible” becomes a constitutional constraint that agents must not violate.

Applied in:

See also:

Behavior-Driven Development

A collaborative specification methodology that defines system behavior in natural language scenarios, bridging business intent and machine-verifiable acceptance criteria.

Status: Live | Last Updated: 2026-01-13

Definition

Behavior-Driven Development (BDD) is a collaborative specification methodology that defines system behavior in natural language scenarios. It synthesizes Test-Driven Development (TDD) and Acceptance Test-Driven Development (ATDD), emphasizing the “Five Whys” principle: every user story should trace to a business outcome.

The key evolution from testing to BDD is the shift from “test” to “specification.” Tests verify correctness; specifications define expected behavior. In agentic workflows, this distinction matters because agents need to understand what behavior is expected, not just what code to write.

Key Characteristics

From Tests to Specifications of Behavior

AspectUnit Testing (TDD)Behavior-Driven Development
Primary FocusCorrectness of code at unit levelSystem behavior from user perspective
LanguageCode-based (Python, Java, etc.)Natural language (Gherkin)
StakeholdersDevelopersDevelopers, QA, Business Analysts, POs
SignalPass/Fail on logicAlignment with business objectives
Agent RoleMinimal (code generation)Central (agent interprets and executes behavior)

The Three Roles in BDD

BDD emphasizes collaboration between three perspectives:

  1. Business — Defines the “what” and “why” (business value, user outcomes)
  2. Development — Defines the “how” (implementation approach)
  3. Quality — Defines the “proof” (verification criteria)

In agentic development, the AI agent often handles Development while Business and Quality remain human-defined. BDD provides the structured handoff format.

BDD in the Probabilistic Era

Traditional BDD was designed for deterministic systems: given specific inputs, expect specific outputs. Agentic systems are probabilistic—LLM outputs vary based on context, temperature, and emergent behavior.

BDD adapts to this by:

ASDLC Usage

BDD’s value in agentic development is semantic anchoring. When an agent is given a Gherkin scenario, it receives a “specification of behavior” that:

This is why BDD scenarios belong in Specs, not just test suites. They’re not just verification artifacts—they’re functional blueprints that guide agent reasoning.

Implementation via the Spec Pattern:

BDD ComponentSpec Implementation
Feature descriptionSpec Context section
Business rulesBlueprint constraints
Acceptance scenariosContract section (Gherkin scenarios)

Applied in:

Context Anchoring

The phenomenon where explicit context biases an LLM toward specific concepts or solutions, even when marked as deprecated or irrelevant to the immediate task.

Status: Live | Last Updated: 2026-02-24

Definition

Context Anchoring is a cognitive bias—well-documented in human psychology and highly prevalent in Large Language Models (LLMs)—where an initial piece of information heavily influences subsequent reasoning and decision-making.

In the domain of AI-assisted software development, Context Anchoring occurs when explicit information provided in a prompt or a context file (like AGENTS.md) biases the model toward specific architectural patterns, libraries, or solutions, often to the detriment of the actual task. This is colloquially referred to as the “Pink Elephant Problem”: telling an LLM not to think about a specific implementation detail ensures that the concept is front-and-center in its attention mechanism.

The Pink Elephant Problem

LLMs are probabilistic, next-token prediction engines. They do not possess a human understanding of negation or deprecation in the same way they understand presence.

If a project’s AGENTS.md file contains the line: “We use tRPC on the backend (Note: legacy endpoints only, new work uses GraphQL),” the model now has the token tRPC active in its context window for every subsequent prompt.

Because the LLM’s attention mechanism assigns weight to explicitly named entities, the agent is statistically more likely to reach for or reference tRPC, even when instructed to build a new feature. The LLM struggles to distinguish between “this is a historical fact about the codebase” and “this is a relevant instruction for the current task.” You said it, so it is there, competing for attention.

Key Characteristics

The impact of Context Anchoring manifests in three distinct failure modes:

  1. Diluted Attention: Every line of context placed in the prompt competes with the actual objective. Research natively shows that as context length increases, task performance degrades even when the added information is perfectly relevant.
  2. The “Lost in the Middle” Effect: Crucial instructions can be ignored or hallucinated over if they are surrounded by dense, anchored noise (like a comprehensive explanation of a legacy directory structure).
  3. Hyper-fixation: The agent fixates on a specific tool, file, or pattern mentioned in the context (the “anchor”), attempting to wedge it into solutions where it does not belong.

The Diagnostic Inversion

A common anti-pattern in Agentic SDLC is treating context files as a persistent band-aid for codebase friction. If an agent consistently struggles to import the correct utility function, a developer’s instinct is to add an explicit instruction to the context: “Always import from src/utils/core, not src/utils/legacy.”

While this might solve the immediate problem, it introduces an anchor. The correct mental model is to treat AGENTS.md as a diagnostic tool. Every instruction added to steer the agent away from a mistake is a signal of structural friction in the codebase.

The ideal response is not to expand the context file, but to fix the underlying ambiguity—for example, by actually deleting the legacy utilities, reorganizing the directory structure, or adding a linter rule. Once the structural friction is resolved, the anchoring instruction should be deleted.

ASDLC Usage

In ASDLC, understanding Context Anchoring drives our philosophy of extreme constraint minimalism.

Applied in:

Context Engineering

Context Engineering is the practice of structuring information to optimize LLM comprehension and output quality.

Status: Live | Last Updated: 2026-02-24

Definition

Context Engineering is the systematic approach to designing and structuring the input context provided to Large Language Models (LLMs) to maximize their effectiveness, accuracy, and reliability in generating outputs.

The practice emerged from the recognition that LLMs operate on explicit information only—they cannot intuit missing business logic or infer unstated constraints. Context Engineering addresses this by making tacit knowledge explicit, machine-readable, and version-controlled.

The Requirements Gap

“Prompt Engineering” is often a misnomer. It is simply Requirements Engineering adapted for a probabilistic system. Unlike a human developer who asks clarifying questions when requirements are vague (“What happens if the payment fails?”), an LLM generates the statistically most likely continuation based on its training data. It does not “understand” the business domain; it predicts patterns. When explicit logic is missing, the model defaults to the average case found in its training set, leading to code that is syntactically correct but semantically misaligned with specific project needs.

The Cold Start Problem

Martin Fowler observes: “As I listen to people who are serious with AI-assisted programming, the crucial thing I hear is managing context.”

Anthropic’s research confirms this. Engineers cite the cold start problem as the biggest blocker:

“There is a lot of intrinsic information that I just have about how my team’s code base works that Claude will not have by default… I could spend time trying to iterate on the perfect prompt [but] I’m just going to go and do it myself.”

Context Engineering solves cold start by capturing this intrinsic information in files the agent can read.

Key Characteristics

  1. Version Controlled: Context exists as a software asset that lives in the repo, is diffed in PRs, and is subject to peer review.
  2. Standardized: Formatted to be readable by any agent (Cursor, Windsurf, Devin, GitHub Copilot).
  3. Iterative: Continuously refined based on agent failure modes and tacit information discovered by Human-in-the-loop (HITL) workflows.
  4. Schema-First: Data structures defined before requesting content generation to ensure type safety and validation.
  5. Hierarchical: Information organized by importance—critical instructions first, references second, examples last.

Applications

While ASDLC focuses on software development, Context Engineering is domain-agnostic:

Screaming Architecture

Context Engineering extends to the filesystem itself. As Raf Lefever notes, “If your code-base doesn’t scream its domain, AI will whisper nonsense.”

A well-structured filesystem (e.g., src/features/checkout/core-logic) provides implicit context to the LLM about intent and boundaries. A generic filesystem (src/utils, src/managers) forces the LLM to guess. In ASDLC, we optimize directory structures to be “training wheels” for the agent.

Toolchain as Context Reduction

Context Engineering is typically framed as a question of what to put in context. Equally important is what to leave out.

Every constraint enforced deterministically by the toolchain is context that does not need to be in the prompt. A well-configured biome.json silently eliminates an entire class of style instructions. A strict tsconfig.json makes type safety rules unnecessary to state. Treat your linter, formatter, and type checker configurations as upstream context engineering — they narrow the solution space before the agent ever sees the prompt.

This principle has empirical support. Gloaguen et al. (2026) found that agents follow context file instructions faithfully, which means unnecessary instructions impose a real cost: broader exploration, more reasoning tokens, higher inference cost — without improving task outcomes. The implication is that bloated context files are not neutral; they are actively harmful.

Furthermore, agents are highly susceptible to Context Anchoring. Telling an LLM what not to do ensures that the concept is front-and-center in its attention mechanism. If your AGENTS.md says “do not use tRPC”, the agent might still reach for it because the token tRPC is highly active in the context window.

The decision hierarchy for any constraint:

  1. Can a runtime gate enforce it? → Use the gate
  2. Can a toolchain config enforce it? → Use the config
  3. Neither? → It belongs in context

Multi-Layer Action Spaces and Economics

The cost and latency of agent orchestration scale directly with context size. As agents take on larger tasks, explicit definition of massive MCP (Model Context Protocol) toolsets bloats the context window.

The Solution: Push actions from the tool-calling layer to the OS layer. By equipping agents with a basic “Virtual Computer” (shell and filesystem access), they can interact with command-line utilities implicitly rather than parsing dozens of explicit JSON schema definitions. This action space offloading dramatically improves the economics of “Prompt Caching,” making high-capacity agent loops viable.

The “Learned Context Management” Fallacy

Some theories suggest that “The Bitter Lesson” applies to context—that as foundational models scale, they will natively learn to manage their own memory streams, rendering explicit file-centric state and Context Gates obsolete.

In ASDLC, we dispute this. Relying on a probabilistic model’s native “attention mechanism” to remember a critical business constraint from 30 turns ago is a regression to “Vibe Coding.” Explicit, deterministically structured context ensures the system fulfills contracts, rather than drifting on the model’s statistical average.

Distinctions

Context vs Guardrails

A distinction exists between Guardrails (Safety) and Context (Utility). Currently, many AGENTS.md files contain defensive instructions like “Do not delete files outside this directory” or “Do not output raw secrets.” This is likely a transitional state. OpenAI, Anthropic, Google, and platform wrappers are racing to bake these safety constraints directly into the inference layer. Soon, telling an agent “Don’t leak API keys” will be as redundant as telling a compiler “Optimize for speed.”

ASDLC Usage

In ASDLC, context is treated as version-controlled code, not ephemeral prompts.

Applied in:

Related Patterns:

[!NOTE] Research Validation (InfiAgent, 2026): File-centric state management outperforms compressed long-context prompts. Replacing persistent file state with accumulated conversation history dropped task completion from 80/80 to 27.7/80 average, even with Claude 4.5 Sonnet. This validates treating context as a reconstructed view of authoritative file state, not as conversation memory.

Extreme Programming

A software development methodology emphasizing high-frequency feedback, testing, and continuous refactoring, which maps perfectly to the Agentic SDLC.

Status: Live | Last Updated: 2026-02-24

Definition

Extreme Programming (XP) is a software development methodology intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent “releases” in short development cycles, aiming to improve productivity and introduce checkpoints where new customer requirements can be adopted.

Originally formalized by Kent Beck in the late 1990s, XP takes recognized “good” practices—such as testing, review, and integration—and pushes them to “extreme” levels. If testing is good, everyone will test all the time (TDD). If code reviews are good, we will review code all the time (Pair Programming). If design is good, we will make it part of everybody’s daily business (Continuous Refactoring).

Key Characteristics

Traditional Extreme Programming relies on several core engineering disciplines:

  1. Test-Driven Development (TDD): Writing automated tests before writing the implementation code to define exact behavior boundaries.
  2. Pair Programming: Two developers working at a single workstation—one writing code (the Driver), the other reviewing each line as it is typed (the Navigator).
  3. Continuous Refactoring: Relentlessly improving the internal structure of the code without changing its external behavior to manage technical debt.
  4. Continuous Integration (CI): Integrating and testing the system many times a day to prevent “integration hell.”
  5. Small Releases: Deploying minimal viable increments frequently to validate assumptions.

The Agentic Transmutation

Many agile frameworks (like Scrum) emphasize human-centric ceremonies (sprints, stand-ups, planning poker) that are difficult to translate into machine execution. Extreme Programming, by contrast, is fundamentally engineering-driven. Because XP focuses on structural rigor, continuous validation, and high-frequency feedback loops, it maps perfectly to the Agentic Software Development Life Cycle (ASDLC).

In ASDLC, we do not abandon XP; we industrialize it by replacing human labor with agentic execution in the high-friction logistics layers.

1. TDD $\rightarrow$ Probabilistic Unit Testing & Context Gates

In traditional XP, humans write unit tests to catch human regressions. In ASDLC, humans write tests to constrain agent hallucination. The tests become the strict, deterministic boundaries (Context Gates) that verify whether the probabilistic model (the Agent) successfully adhered to the (Spec). TDD is no longer a best practice; it is the mandatory safety harness for autonomous code generation.

2. Pair Programming $\rightarrow$ The Pilot & Instructor Model

The classic Driver/Navigator dynamic of pair programming is perfectly preserved in the ASDLC, but the roles are specialized. The Agent acts as the Driver (writing the boilerplate, executing refactors, generating the syntax), while the Human acts as the Navigator or Instructor (reviewing structural integrity, managing context, and steering the architecture). The human no longer types; they govern the trajectory.

3. Continuous Integration $\rightarrow$ The Cybernetic Loop

Agents do not suffer from fatigue context-switching. Therefore, the “Continuous Integration” of XP becomes a literal Cybernetic Loop, wherein agents merge, test, and validate micro-commits continuously.

4. Continuous Refactoring $\rightarrow$ Agent Optimization Loops

While agents refactor application code to manage technical debt, the overarching system refactors the agents themselves. The Agent Optimization Loop continuously tests agents against Scenarios to refine their underlying prompts and instructions (e.g., distilling their AGENTS.md context files) based on failure rates.

ASDLC Usage

In ASDLC, Extreme Programming is not a historical artifact; it is the philosophical engine driving how we structure agentic behavior.

Applied in:

Gherkin

A structured, domain-specific language using Given-When-Then syntax to define behavioral specifications that are both human-readable and machine-actionable.

Status: Live | Last Updated: 2026-01-13

Definition

Gherkin is a structured, domain-specific language using Given-When-Then syntax to define behavioral specifications in plain text. While Behavior-Driven Development provides the methodology, Gherkin provides the concrete syntax.

Gherkin’s effectiveness for LLM agents stems from its properties: human-readable without technical jargon, machine-parseable with predictable structure, and aligned between technical and non-technical stakeholders. Each keyword defines a phase of reasoning that prevents agents from conflating setup, action, and verification into an undifferentiated blob.

The Given-When-Then Structure

Gherkin scenarios follow a consistent three-part structure:

Feature: User Authentication
  As a registered user
  I want to log into the system
  So that I can access my personalized dashboard

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    And the user has password "SecurePass123"
    When the user submits login credentials
    Then the user should be redirected to the dashboard
    And a session token should be created

Keyword Semantics

KeywordTraditional BDDAgentic Translation
GivenPreconditions or initial stateContext setting, memory retrieval, environment setup
WhenThe trigger event or user actionTask execution, tool invocation, decision step
ThenThe observable outcomeVerification criteria, alignment check, evidence-of-done
And/ButAdditional conditions within a stepLogical constraints, secondary validation parameters
FeatureHigh-level description of functionalityFunctional blueprint, overall agentic goal
BackgroundSteps common to all scenariosPre-test fixtures, global environment variables

ASDLC Usage

Gherkin isn’t just a testing syntax—it’s a semantic constraint language for agent behavior.

When an agent reads a Gherkin scenario:

This partitioning prevents “context bleed” where agents conflate setup, action, and verification.

In Specs: The Spec Contract section uses Gherkin scenarios:

## Contract

### Scenarios

#### Happy Path
Given a valid API key
When the user requests /api/notifications
Then the response returns within 100ms
And the payload contains the user's notifications

Applied in:

Levels of Autonomy

SAE-inspired taxonomy for AI agent autonomy in software development, from L1 (assistive) to L5 (full), standardized at L3 conditional autonomy.

Status: Live | Last Updated: 2026-01-09

Definition

The Levels of Autonomy scale categorizes AI systems based on their operational independence in software development contexts. Inspired by the SAE J3016 automotive standard, it provides a shared vocabulary for discussing human oversight requirements.

The scale identifies where the Context Gate (the boundary of human oversight) must be placed for each level. Under this taxonomy, autonomy is not a measure of intelligence—it is a measure of operational risk and required human involvement.

The Scale

LevelDesignationDescriptionHuman RoleFailure Mode
L1AssistiveAutocomplete, Chatbots. Zero state retention.Driver. Hands on wheel 100% of time.Distraction / Minor Syntax Errors
L2Task-Based”Fix this function.” Single-file context.Reviewer. Checks output before commit.Logic bugs within a single file.
L3Conditional”Implement this feature.” Multi-file orchestration.Change Owner. Validates CI/CD, footprint, & intervenes on drift.Regression to the Mean (Mediocrity).
L4High”Manage this backlog.” Self-directed planning.Auditor. Post-hoc analysis.Silent Failure. Strategic drift over time.
L5Full”Run this company.”Consumer. Passive beneficiary.Existential alignment drift.

Analogy: The Self-Driving Standard (SAE)

The software autonomy scale maps directly to SAE J3016, the automotive standard for autonomous vehicles. This clarifies “Human-in-the-Loop” requirements using familiar terminology.

ASDLC LevelSAE EquivalentThe “Steering Wheel” Metaphor
L1L1 (Driver Assist)Hands On, Feet On. AI nudges the wheel (Lane Keep) or gas (Cruise), but Human drives.
L2L2 (Partial)Hands On (mostly). AI handles steering and speed in bursts, but Human monitors constantly.
L3L3 (Conditional)Hands Off, Eyes On. AI executes the maneuver (The Drive). Human is the Owner ready to intervene if it leaves the paved path.
L4L4 (High)Mind Off. Sleeping in the back seat within a geo-fenced area. Dangerous if the “fence” (Context) breaks.
L5L5 (Full)No Steering Wheel. The vehicle has no manual controls.

ASDLC Usage

ASDLC standardizes practices for Level 3 (Conditional Autonomy) in software engineering. While the industry frequently promotes L5 as the ultimate goal, this perspective is often counterproductive given current tooling maturity. L3 is established as the sensible default.

[!WARNING] Level 4 Autonomy Risks

At L4, agents operate for days without human intervention but lack the strategic foresight needed to maintain system integrity. This results in Silent Drift—the codebase continues to function technically but gradually deteriorates into an unmanageable state.

While advanced verification environments like the AI Software Factory offer technical mitigations against drift, eliminating human code review introduces severe, unpriced Governance Threats (including Liability and Disclosure gaps) that make L4 operations high-risk for enterprise compliance.

[!NOTE] Empirical Support for L3

Anthropic’s 2025 internal study of 132 engineers validates L3 as the practical ceiling:

  • Engineers fully delegate only 0-20% of work
  • Average 4.1 human turns per Claude Code session
  • High-level design and “taste” decisions remain exclusively human-owned
  • The “paradox of supervision”—effective oversight requires skills that AI use may atrophy

Applied in:

Mermaid

A text-based diagramming language that renders flowcharts, sequences, and architectures from markdown, enabling version-controlled visual specifications.

Status: Live | Last Updated: 2026-01-13

Definition

Mermaid is a text-based diagramming language that renders flowcharts, sequence diagrams, and architecture visualizations from markdown-style code blocks. In agentic development, Mermaid serves as the specification language for processes, workflows, and system relationships.

Where Gherkin specifies behavior and YAML specifies structure, Mermaid specifies process—how components interact, how data flows, and how state transitions occur.

Key Characteristics

Text-Based Diagrams

Mermaid diagrams are defined in plain text, making them:

flowchart LR
    A[Input] --> B[Process]
    B --> C[Output]
Mermaid Diagram

Diagram Types

TypeUse CaseASDLC Application
FlowchartProcess flows, decision treesFeature Assembly, Context Gates
SequenceAPI interactions, message flowsService contracts, Integration specs
StateState machines, lifecycleComponent state, Workflow phases
ClassObject relationshipsDomain models, Architecture
EREntity relationshipsData models, Schema design
GanttTimeline, schedulingRoadmaps, Sprint planning

Subgraphs for Grouping

Subgraphs partition complex diagrams into logical regions:

flowchart LR
    subgraph Input
        A[Source]
    end
    
    subgraph Processing
        B[Transform]
        C[Validate]
        B --> C
    end
    
    A --> B
    C --> D[Output]
Mermaid Diagram

ASDLC Usage

Mermaid serves as the process specification language in ASDLC, completing the specification triad:

LanguageSpecifiesExample
GherkinBehaviorGiven/When/Then scenarios
YAMLStructureSchemas, configuration
MermaidProcessFlowcharts, sequences

Why Mermaid for Specs:

Text-based diagrams solve a critical problem in agentic development: visual documentation that agents can read, modify, and version-control. Unlike image-based diagrams that become stale context, Mermaid diagrams are:

Relationship to Patterns:

Anti-Patterns

Anti-PatternDescription
Box SoupToo many nodes without grouping
Arrow SpaghettiExcessive cross-connections
No LabelsEdges without descriptive text
Static ScreenshotsImages instead of text diagrams

[!TIP] Key practices: Group with subgraphs, label edges, use flowchart LR for process flows, limit to <15 nodes per diagram.

Model-Driven Development

An early 2000s software engineering paradigm that attempted 100% code generation from models, serving as a cautionary tale for modern spec-as-source AI hype.

Status: Live | Last Updated: 2026-03-03

Definition

Model-Driven Development (MDD)—often associated with Model-Driven Architecture (MDA)—was a software engineering movement prominent in the early 2000s. Its core ambition was to elevate the level of abstraction in software engineering.

In MDD, developers authored high-level visual models (such as UML diagrams) or textual Domain-Specific Languages (DSLs) instead of writing general-purpose code. Complex code generation tools were then tasked with translating these models into 100% of the underlying implementation code. The goal was to separate the functional business logic (the model) from the technical implementation details.

Why MDD Failed

Despite enormous industry hype, MDD failed to achieve mainstream enterprise adoption for several structural reasons:

1. The Abstraction Trap

The promise of MDD was that models would be simpler than code. In practice, writing a model that was precise enough to generate edge-case-handling production code required the model itself to become just as complex as the code it was replacing. Instead of reducing complexity, MDD merely shifted it from a well-understood programming language into a proprietary, clunky modeling language.

2. Big Upfront Design (BUFD)

MDD entrenched a rigid, waterfall-style methodology. It required massive upfront investment in creating complete models before any executable software emerged. This fundamental inflexibility clashed directly with the rise of Agile methodologies, which prioritized rapid iteration, immediate feedback, and working software over comprehensive documentation.

3. Tooling Inadequacy

MDD relied entirely on code generation tools. These tools were often expensive, proprietary, and poorly integrated into developer workflows. Crucially, they lacked essential developer experience (DX) features. When a generated application had a bug, deciphering whether the flaw was in the model, the code generator, or the execution environment was nearly impossible because debugging at the “model level” was severely limited.

The LLM Renaissance (and Risk)

The rise of Generative AI and Large Language Models (LLMs) has sparked a renewed interest in the core premise of MDD, often rebranded under terms like spec-as-source.

Because LLMs can parse natural language, they remove the need for rigid DSLs and complex, proprietary parsers. Developers can now write a natural language specification and ask an AI agent to generate the code.

However, trading MDD for LLMs introduces a dangerous new variable: non-determinism.

While MDD failed because its determinism was too rigid, LLMs struggle because they are inherently probabilistic. If a human only edits a natural language spec and expects the LLM to cleanly regenerate the entire codebase 1:1 every time without drifting or introducing novel bugs, they are treating the LLM like a magical compiler.

The ASDLC Stance

ASDLC views the history of MDD as a critical cautionary tale. The desire to never look at implementation code again is an anti-pattern.

We learn from MDD’s failures by adopting a spec-anchored philosophy:

  1. Specs are for Intent: We write Living Specs to define architectural boundaries, invariants, and accepted behavior.
  2. Code is for Logic: We retain human oversight over the deterministic implementation code.

Code is not an implementation detail to be abstracted away; it is the only medium capable of expressing logic deterministically. ASDLC uses agents to write code quickly guided by specs, not to hide the code entirely.

Read Next: Learn how ASDLC navigates these pitfalls in Spec-Driven Development.

OODA Loop

The Observe-Orient-Decide-Act decision cycle—a strategic model from military combat adapted for autonomous agent behavior in software development.

Status: Live | Last Updated: 2026-01-13

Definition

The OODA Loop—Observe, Orient, Decide, Act—is a strategic decision-making cycle originally developed by U.S. Air Force Colonel John Boyd for aerial combat. Boyd’s insight: the combatant who cycles through these phases faster than their opponent gains decisive advantage. The key isn’t raw speed—it’s tempo relative to environmental change.

Boyd’s less-quoted but crucial insight: Orient is everything. The Orient phase is where mental models, context, and prior experience shape how observations become decisions. A faster but poorly-oriented loop loses to a slower but well-oriented one.

In agentic software development, OODA provides the cognitive model for how autonomous agents should behave: continuously cycling through observation, interpretation, planning, and execution.

The Four Phases

  1. Observe — Gather information about the current state of the environment
  2. Orient — Interpret observations through mental models, context, and constraints
  3. Decide — Formulate a specific plan for action based on orientation
  4. Act — Execute the plan, producing changes that feed new observations

The loop is continuous. Each Act produces new state, triggering new Observe, and the cycle repeats.

Key Characteristics

Tempo, Not Raw Speed

The strategic value of OODA isn’t speed—it’s cycling faster than the environment changes. In software development, the “environment” is the codebase, requirements, and constraints. An agent that can cycle through OODA before context rot sets in converges on correct solutions.

Orient as the Critical Phase

For AI agents, Orient is the context window. The quality of orientation depends on:

This is why Context Engineering isn’t optional overhead. It’s engineering the Orient phase, which determines whether fast cycling produces progress or noise.

OODA vs. Single-Shot Interactions

Standard LLM interactions are Observe-Act: user provides input, model produces output. No explicit Orient or Decide phase. The model’s “orientation” is implicit in training and whatever context happens to be present.

Agentic workflows make OODA explicit:

PhaseSingle-Shot LLMAgentic Workflow
ObserveUser promptInstrumented: read files, run tests, check logs
OrientImplicit (training + context)Engineered: Specs, Constitution, Context Gates
DecideImplicitExplicit: agent states plan before acting
ActGenerate responseVerified: external tools confirm success/failure

This explicit structure enables debugging. When an agent fails, you can diagnose which phase broke down:

ASDLC Usage

In ASDLC, OODA explains why cyclic workflows outperform linear pipelines:

OODA PhaseAgent BehaviorASDLC Component
ObserveRead codebase state, error logs, test resultsFile state, test output
OrientInterpret against context and constraintsContext Gates, AGENTS.md
DecideFormulate implementation planPBI decomposition
ActWrite code, run tests, commitMicro-commits

The Learning Loop is OODA with an explicit “Crystallize” step that improves future Orient phases. Where OODA cycles continuously, Learning Loop captures discoveries into machine-readable context for subsequent agent sessions.

Applied in:

Anti-Patterns

Anti-PatternDescriptionFailure Mode
Observe-ActSkipping Orient/Decide. Classic vibe coding.Works for simple tasks; fails at scale; no learning
Orient ParalysisOver-engineering context, never actingAnalysis paralysis; no forward progress
Stale OrientNot updating mental model when observations changeContext rot; agent operates on outdated assumptions
Observe BlindnessNot instrumenting observation of relevant stateAgent misses critical information (failed tests, error logs)
Act Without VerifyNot confirming action results before next cycleCascading errors; false confidence

Request for Comments

A collaborative proposal document for significant changes that require team consensus before becoming formal decisions.

Status: Live | Last Updated: 2026-01-28

Definition

A Request for Comments (RFC) is a proposal document that solicits feedback on significant changes before they become formal decisions. Unlike an ADR which records a decision already made, an RFC opens a decision for collaborative input.

The term originates from the IETF (Internet Engineering Task Force), where RFCs have defined internet protocols since 1969. Modern software projects—Rust, React, Ember, Python—have adopted RFC processes for significant changes that affect many stakeholders.

Key Characteristics

Proposal-Oriented

RFCs propose; ADRs record. An RFC says “We should consider doing X” while an ADR says “We decided to do X.” The RFC process concludes with either acceptance (spawning ADRs) or rejection.

Collaborative

RFCs are designed for multi-stakeholder input. They include explicit comment periods and revision cycles. The goal is to surface concerns before committing to a direction.

Scope

RFCs typically cover changes that:

Single-component decisions usually don’t warrant an RFC—a direct ADR suffices.

Relationship to ADRs

DimensionRFCADR
PurposePropose and gather feedbackRecord a decision
TimingBefore decisionAfter decision
MutabilityRevised during comment periodImmutable once accepted
OutputOne or more ADRsImplementation guidance

An RFC may spawn multiple ADRs. For example, “RFC: Migrate from Firebase to Supabase” might result in:

ASDLC Usage

In ASDLC, RFCs are appropriate for:

For routine architectural decisions within a single feature domain, a direct ADR is sufficient.

Applied in:

See also:

Spec-Driven Development

Methodology that defines specifications before implementation, treating specs as living authorities that code must fulfill.

Status: Live | Last Updated: 2026-01-18

Definition

Spec-Driven Development (SDD) is an umbrella term for methodologies that define specifications before implementation. The core inversion: instead of code serving as the source of documentation, the spec becomes the authority that code must fulfill.

SDD emerged as a response to documentation decay in software projects. Traditional approaches treated specs as planning artifacts that diverged from reality post-implementation. Modern SDD treats specs as living documents co-located with code.

Contrast: For the anti-pattern SDD addresses, see Vibe Coding.

Key Characteristics

Living Documentation

Specs are not “fire and forget” planning artifacts. They reside in the repository alongside code and evolve with every change to the feature. This addresses the classic problem of documentation decay.

Iterative Refinement

Kent Beck critiques SDD implementations that assume “you aren’t going to learn anything during implementation.” This is a valid concern—specs must evolve during implementation, not block it. The spec captures learnings so future sessions can act on them.

Determinism Over Vibes

Nick Tune argues that orchestration logic should be “mechanical based on simple rules” (code) rather than probabilistic (LLMs). Specs define the rigid boundaries; code enforces the workflow; LLMs handle only the implementation tasks where flexibility is required.

Visual Designs Are Not Specs

[!WARNING] The Figma Trap A beautiful mockup is not a specification; it is a suggestion. Mockups typically demonstrate the “happy path” but hide the edge cases, error states, and data consistency rules where production bugs live.

Never treat a visual design as a complete technical requirement.

Levels of SDD Adoption

Industry usage of the term SDD varies in maturity. The following levels describe how deeply a team relies on the specification:

  1. spec-first: A specification is written upfront and used to generate the initial code. Afterward, the spec is abandoned, and developers return to editing code directly.
  2. spec-anchored: The spec is maintained throughout the feature’s lifecycle inside the repository. It remains the source of truth for architectural intent and functional contracts (This is the ASDLC target).
  3. spec-as-source: Only the spec is ever edited by humans. The codebase is 100% generated by LLMs acting as compilers.

Anti-Patterns

Spec-as-Source

While treating the specification as the only source code (spec-as-source) sounds appealing, ASDLC regards it as a dangerous anti-pattern.

It is a regression to the failed paradigms of Model-Driven Development (MDD). MDD failed because models became as complex as code, yet remained inflexible. Replacing strict MDD code generators with LLMs introduces non-determinism. If you generate an entire system from a natural language spec, tiny changes in the spec (or an update to the underlying LLM) can cause widespread, unpredictable changes in the generated logic.

To maintain control, we must remain spec-anchored. We use specs to define intent and boundaries, but we retain deterministic code as the ultimate truth for logical execution.

ASDLC Usage

ASDLC implements Spec-Driven Development through:

See also:

The 4D Framework (Anthropic)

A cognitive model codifying four essential competencies—Delegation, Description, Discernment, and Diligence—for effective generative AI use.

Status: Live | Last Updated: 2026-01-13

Definition

The 4D Framework is a cognitive model for human-AI collaboration developed by Anthropic in partnership with Dr. Joseph Feller and Rick Dakan as part of the AI Fluency curriculum.

The framework codifies four essential competencies for leveraging generative AI effectively and responsibly:

  1. Delegation — The Strategy
  2. Description — The Prompt
  3. Discernment — The Review
  4. Diligence — The Liability

Unlike process models (e.g., Agile or Double Diamond) that dictate workflow timing, the 4D Framework specifies how to interact with AI systems. It positions the human not merely as a “prompter,” but as an Editor-in-Chief, accountable for strategic direction and risk management.

The Four Dimensions

Delegation (The Strategy)

Before engaging with the tool, the human operator must determine what, if anything, should be assigned to the AI. This is a strategic decision between Automation (offloading repetitive tasks) and Augmentation (leveraging AI as a thought partner).

Core Question: “Is this task ‘boilerplate’ with well-defined rules (High Delegation), or does it demand nuanced judgment, deep context, or ethical considerations (Low Delegation)?”

Description (The Prompt)

AI output quality is directly proportional to input quality. “Description” transcends prompt engineering hacks by emphasizing Context Transfer—delivering explicit goals, constraints, and data structures required for the task.

Core Question: “Have I specified the constraints, interface definitions, and success criteria needed for this task?”

Discernment (The Review)

This marks the transition from Creator to Editor. The human must rigorously assess AI output for accuracy, hallucinations, bias, and overall quality. Failing to apply discernment is a leading cause of “AI Technical Debt.”

Core Question: “If I authored this output, would it meet code review standards? Does it introduce fictitious libraries or violate design tokens?”

Diligence (The Liability)

The human user retains full accountability for outcomes. Diligence acknowledges that while AI accelerates execution, it never removes user responsibility for security, copyright, or ethical compliance.

Core Question: “Am I exposing PII in the context window? Am I deploying unvetted code to production?”

Key Characteristics

The Editor-in-Chief Mental Model

The 4D Framework repositions the human from “prompt writer” to “editorial director.” Just as a newspaper editor doesn’t write every article but maintains accountability for what gets published, the AI-fluent professional maintains responsibility for all AI-generated outputs.

Continuous Cycle

These four dimensions are not sequential steps but concurrent concerns. Every AI interaction requires simultaneous attention to all four:

Anti-Patterns

Anti-PatternDescription
Over-DelegationAssigning strategic decisions or ethically sensitive tasks to AI
Vague DescriptionUsing natural language prompts without context, constraints, or examples
Blind AcceptanceCopy-pasting AI output without verification
Liability DenialAssuming AI-generated content is inherently trustworthy or legally defensible

ASDLC Usage

Applied in: AGENTS.md Specification, Context Engineering, Context Gates

The 4D dimensions map to ASDLC constructs: Delegation → agent autonomy levels, Description → context engineering, Discernment → context gates, Diligence → guardrail protocols.

The Learning Loop

The iterative cycle between exploratory implementation and spec refinement, balancing vibe coding velocity with captured learnings.

Status: Live | Last Updated: 2026-01-26

Definition

The Learning Loop is the iterative cycle between exploratory implementation and constraint crystallization. It acknowledges that understanding emerges through building, while ensuring that understanding is captured for future agent sessions.

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” He’s right—discovery is essential. But pure vibe coding loses those discoveries. The next agent session starts from zero, re-discovering (or missing) the same constraints.

The Learning Loop preserves discoveries as machine-readable context, enabling compounding understanding across sessions.

The Cycle

  1. Explore — Vibe code to discover edge cases, performance characteristics, or API behaviors
  2. Learn — Identify constraints that weren’t obvious from requirements
  3. Crystallize — Update the Spec with discovered constraints
  4. Verify — Gate future implementations against the updated Spec
  5. Repeat

Each iteration builds on the last. The spec grows smarter, and agents inherit the learnings of every previous session.

OODA Foundation

The Learning Loop is an application of the OODA Loop to software development:

Learning Loop PhaseOODA Equivalent
ExploreObserve + Act (gather information through building)
LearnOrient (interpret what was discovered)
CrystallizeDecide (commit learnings to persistent format)
VerifyObserve (confirm crystallized constraints via gates)

The key insight: in software development, Orient and Observe are interleaved. You often can’t observe relevant constraints until you’ve built something that reveals them. The Learning Loop makes this explicit by treating Explore as a legitimate phase rather than a deviation from the plan.

Key Characteristics

Not Waterfall

The Learning Loop explicitly rejects the waterfall assumption that all constraints can be known upfront. Specs are scaffolding that evolve, not stone tablets.

Not Pure Vibe Coding

The Learning Loop also rejects the vibe coding assumption that documentation is optional. Undocumented learnings are lost learnings—the next agent (or human) will repeat the same mistakes.

Machine-Readable Capture

Learnings must be captured in formats agents can consume: schemas, constraints in YAML, acceptance criteria in markdown. Natural language is acceptable but structured data is preferred.

“The real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping.” — Unmesh Joshi

Automation: The Ralph Loop

The Learning Loop describes an iterative cycle that typically involves human judgment at each phase. The Ralph Loop automates this cycle for tasks with machine-verifiable completion criteria:

Learning Loop PhaseRalph Loop Implementation
ExploreAgent implements based on PBI/Spec
LearnAgent reads error logs, test failures, build output
CrystallizeAgent updates progress.txt; commits to Git
VerifyExternal tools (Jest, tsc, Docker) confirm success

When verification fails, Ralph automatically re-enters Explore with the learned context. The loop continues until external verification passes or iteration limit is reached.

Key difference: The Learning Loop expects human judgment in the Learn and Crystallize phases. The Ralph Loop requires that “learning” be expressible as observable state (error logs, test results) and “crystallization” be automatic (Git commits, progress files).

Ralph Loops work best when success criteria are machine-verifiable (tests pass, builds complete). For tasks requiring human judgment—ambiguous requirements, architectural decisions, product direction—the Learning Loop remains the appropriate model.

ASDLC Usage

In ASDLC, the Learning Loop connects several core concepts:

Applied in:

Anti-Patterns

Anti-PatternDescription
Waterfall SpecsWriting exhaustive specs before any implementation, assuming no learning will occur
Ephemeral Vibe CodingGenerating code without ever crystallizing learnings into specs
Spec-as-PaperworkUpdating specs for compliance rather than genuine constraint capture
Post-Hoc DocumentationWriting specs after implementation is complete, losing the iterative benefit

YAML

A human-readable data serialization language that serves as the structured specification format for configuration, schemas, and file structures in agentic workflows.

Status: Live | Last Updated: 2026-01-13

Definition

YAML (YAML Ain’t Markup Language) is a human-readable data serialization language designed for configuration files, data exchange, and structured documentation. In agentic development, YAML serves as the specification language for data structures, schemas, and file organization.

Where Gherkin specifies behavior (Given-When-Then), YAML specifies structure (keys, values, hierarchies). Both are human-readable formats that bridge the gap between human intent and machine execution.

Key Characteristics

Human-Readable Structure

YAML’s indentation-based syntax mirrors how humans naturally organize hierarchical information:

notification:
  channels:
    - websocket
    - email
    - sms
  constraints:
    latency_ms: 100
    retry_count: 3
  fallback:
    enabled: true
    order: [websocket, email, sms]

Schema-First Design

YAML enables schema-first development where data structures are defined before implementation:

# Schema definition in spec
user:
  id: string (UUID)
  email: string (email format)
  roles: array of enum [admin, user, guest]
  created_at: datetime (ISO 8601)

Agents can validate implementations against these schemas, catching type mismatches and missing fields before runtime.

Configuration as Code

YAML configurations live in version control alongside code, enabling:

ASDLC Usage

YAML serves as the data structure specification language in ASDLC, completing the specification triad:

In Specs: All ASDLC articles use YAML frontmatter for structured metadata. The Spec pattern leverages YAML for schema definitions that agents validate against.

In AGENTS.md: The AGENTS.md Specification uses YAML for structured directives—project context, constraints, and preferred patterns.

Applied in:

AI Software Factory

An industrial-scale approach to software engineering. Explores the dichotomy between Safe ASDLC Factories (L3) and high-risk Dark Factories (L4).

Status: Experimental | Last Updated: 2026-03-09

Definition

The Software Factory is a concept inherited from DevOps and manufacturing that models software development as an industrial assembly line rather than a bespoke craft.

In the agentic era, an AI Software Factory uses autonomous agents to automate the “run the business” toil (technical debt, dependency updates, bug fixes, operational overhead). By industrializing these tasks, human capital is shifted toward high-level creative architecture, problem-solving, and system design.

The Dichotomy: L3 vs L4 Factories

As organizations attempt to eliminate human bottlenecks, two distinct operational modes have emerged:

1. The Safe Factory (The ASDLC Model)

This model operates at Level 3 (Conditional) Autonomy. Agents act as the high-throughput generation engine on the assembly line, but humans retain the ultimate verification controls.

Driven by rigorous Spec-Anchored Development, human engineers define the architecture and act as the final Acceptance Gate. Crucially, the human elevates from Code Auditor to Change Owner. By relying on automated Quality Gates and agentic Review Gates for line-by-line syntax and specification checks, the human focuses PR reviews on the structural footprint (“what files changed?”) and strategic fitness (“does this solve the problem safely?”). They approve state transitions and maintain complete Provenance over what enters production without becoming a bottleneck.

2. The Dark Factory (L4 Model)

In this model, “Code must not be written by humans. Code must not be reviewed by humans.” The lights are out because nobody needs to see.

Because deterministic human code review is eliminated entirely, Dark Factories must substitute it with Probabilistic Satisfaction. Quality is measured empirically: Of all the observed trajectories through thousands of holdout test scenarios, what fraction satisfy the user?

To achieve this testing scale without exhausting API rate limits or incurring massive vendor costs, Dark Factories utilize Digital Twins—high-fidelity, in-memory clones of required third-party services (e.g., Slack, Stripe, Jira).

ASDLC Position & Governance Risks

ASDLC standardizes heavily around the L3 Safe Factory. We consider the L4 Dark Factory to be an experimental, high-risk frontier that introduces unpriced regulatory exposure.

While the technical hurdles of eliminating human review are actively being solved by Digital Twins and multi-agent synthesis, taking humans out of the code review loop entirely introduces severe Governance Risks:

  1. Silent Drift: Without constant file-level human intervention, the codebase functions technically against its tests but gradually deteriorates into an unmaintainable architectural state over months.
  2. The Liability Gap: If a silently agent-deployed module fails, it is legally unclear who is liable: the architects who wrote the spec, or the AI provider who supplied the base model?
  3. The Disclosure Gap: Currently, no industry standard exists for auditing “agent-built software tested probabilistically against replicas.” Disclosing this to enterprise procurement officers is practically useless without a shared evaluation framework.
  4. The Contractual Gap: Vendors operating Dark Factories often still use “AS IS” limitation-of-liability boilerplate. A contract designed to disclaim human imperfection is inappropriately absorbing the risk of the complete absence of a human process, destroying trust.

Digital Twins

Virtual replicas of complex systems. In software engineering, these are behavioral clones of third-party services used for high-volume scenario testing.

Status: Experimental | Last Updated: 2026-03-09

Definition

A Digital Twin is a virtual representation that serves as the real-time digital counterpart of a physical object or process. Originating in aerospace packaging and popularized by IoT and manufacturing (e.g., simulating jet engines or assembly lines before deployment), the concept revolves around creating high-fidelity simulated replicas for experimentation and monitoring.

Key Characteristics

ASDLC Usage

In agentic software development, the “physical system” being replicated is often external software itself. ASDLC applies the Digital Twin concept to third-party services and integrations (e.g., Slack, Okta, Stripe, Jira).

As organizations move toward AI Software Factories, they replace human testing with thousands of automated, probabilistic test scenarios executed by agents.

The API Bottleneck

Running these massive volumes of agentic integration tests against real SaaS tools frequently triggers fatal issues:

  1. Rate Limits: Hitting 429 Too Many Requests instantly.
  2. Vendor Cost: Accumulating massive API usage bills.
  3. Abuse Flagging: Triggering the vendor’s fraud/abuse systems based on strange, high-volume automated behavior during fuzzing or agent-optimization runs.

The Agentic Solution

Instead of traditional mocking (which is often brittle and out-of-date), developers use AI agents to dynamically generate working, in-memory Digital Twin Universes (DTU).

If you provide a base model with an API Spec and documentation for Stripe, the agent can generate a lightweight, local web server that perfectly mirrors the Stripe API’s expected inputs, state changes, and outputs.

Agents are then pointed at the Digital Twin local endpoint instead of the production API. This enables infinite, zero-cost, zero-latency integration testing—a prerequisite for sustained Level 4 Autonomy where probabilistic evaluation is the only quality gate.

These Digital Twins live entirely in memory or local execution environments. They are functionally identical to the external service from the perspective of the application, but execute instantly without network overhead.

“With their own, independent clones of those services… their army of simulated testers could go wild. Their scenario tests became scripts for agents to constantly execute against the new systems as they were being built.” — Simon Willison

When combined with Holdout Scenarios, Digital Twins provide the bedrock for executing probabilistic assurance at a scale that replaces manual human testing.

Event Modeling

A system blueprinting method that centers on events as the primary source of truth, serving as a rigorous bridge between visual design and technical implementation.

Status: Experimental | Last Updated: 2026-01-01

Definition

Event Modeling is a method for designing information systems by mapping what happens over time. It creates a linear blueprint that serves as the single source of truth for Product, Design, and Engineering.

Unlike static diagrams (like ERDs or UML) that focus on structure, Event Modeling focuses on the narrative of the system. It visualizes the system as a film strip, showing exactly how a user’s action impacts the system state and what information is displayed back to them.

Core Components

An Event Model is composed of four distinct elements:

Why It Matters for AI

In modern software development, ambiguity is the enemy. While human engineers can infer intent from a loose visual mockup, AI models require explicit instructions.

Event Modeling forces implicit business rules to become explicit. By defining the exact data payload of every Command and the resulting state change of every Event, we provide AI agents with a deterministic roadmap. This ensures the generated code handles edge cases and data consistency correctly, rather than just “looking right” on the frontend.

Relationship to Requirements

Event Modeling acts as a bridge between Visual Design (what it looks like) and Technical Architecture (how it works).

It does not replace functional requirements; rather, it validates them. A feature is only considered “defined” when there is a complete path mapped from the user’s view, through the command, to the stored event, and back to the view. This “closed loop” guarantees that every pixel on the screen is backed by real data.

Feedback Loop Compression

How AI compresses the observe → validate → learn cycle, shifting the bottleneck from code production to code understanding.

Status: Experimental | Last Updated: 2026-01-26

Definition

Feedback Loop Compression is the phenomenon where AI collapses the time between deploying code and understanding its production behavior. For 20 years, DevOps attempted to connect developers with production consequences but failed—the loops were “long, lossy, and laggy.” AI has changed this.

“The bottleneck shifts from, ‘How fast can I write code?’ to, ‘How fast can I understand what’s happening and make good decisions about it?’” — Charity Majors

The compression is asymmetric: AI has made the Act phase (code generation) nearly free, while the Orient phase (understanding production state) remains the constraint. Feedback Loop Compression addresses this by making observation and validation as fast as generation.

The Shift in Constraints

EraPrimary BottleneckSecondary Bottleneck
Pre-DevOpsDeployment (ops owns production)Feedback (weeks-to-months)
DevOps EraFeedback loops (still too slow)Code production
AI EraUnderstanding & validationCode production → near-zero

The traditional workflow optimized for the wrong constraint:

Old: write code → test → review → merge → "hope it works!"
New: write code (AI) → deploy → observe → validate → learn → iterate

In the new model, every deploy is a learning opportunity. Shipping frequency becomes the heartbeat of feedback.

OODA Acceleration

Feedback Loop Compression is specifically about accelerating the OODA Loop:

PhaseBefore AIAfter AI
ObserveOps tools, dashboards, manual inspectionAutomated telemetry streaming to dev context
OrientDomain expertise, manual triageAI interprets traces, suggests root causes
DecideDeveloper reasoning about fixAI proposes solutions with verification plans
ActManual code changesAI-generated patches, validated before merge

The key insight: AI doesn’t just accelerate Act—it accelerates the entire cycle. An agent can observe production logs, orient against the codebase, decide on a fix, and act to implement it, all within a single interaction loop.

Implications for L3 Autonomy

At L3 (Conditional Autonomy), humans remain in the loop for judgment calls. Feedback Loop Compression doesn’t eliminate this—it makes each human decision more informed:

The compressed loop doesn’t bypass human oversight; it gives humans better information faster.

The “Nobody Understands It” Risk

“What happens when nobody wrote the code you just deployed, and nobody really understands it?” — Charity Majors

This is the dark side of compressed feedback loops. AI-generated code deployed at AI speed can outpace human understanding. ASDLC addresses this through:

Compressed loops without crystallized understanding lead to accumulated technical debt at AI speed.

ASDLC Usage

Compression EnablerASDLC Response
Code generation → freeFocus shifts to Spec-Driven Development
Observation → automatedRalph Loop reads logs, test output automatically
Orientation → AI-assistedContext Engineering structures what AI sees
Validation → continuousContext Gates enforce verification

Applied in:

Anti-Patterns

Anti-PatternDescription
Shipping BlindCompressing the Act phase without compressing Observe—deploying code without telemetry
Speed Over UnderstandingDeploying faster than the team can comprehend; accumulated mystery code
Observation Without OrientationCollecting telemetry without structuring it for AI comprehension
Lossy LoopsFast cycles that don’t preserve learnings; next session rediscovers same constraints

Product Requirement Prompt (PRP)

A structured methodology combining PRD, codebase context, and agent runbook—the minimum spec for production-ready AI code.

Status: Experimental | Last Updated: 2025-01-05

Definition

A Product Requirement Prompt (PRP) is a structured methodology that answers the question: “What’s the minimum viable specification an AI coding agent needs to plausibly ship production-ready code in one pass?”

As creator Rasmus Widing defines it: “A PRP is PRD + curated codebase intelligence + agent runbook.”

Unlike traditional PRDs (which exclude implementation details) or simple prompts (which lack structure), PRPs occupy the middle ground—a complete context packet that gives an agent everything it needs to execute autonomously within bounded scope.

The methodology emerged from practical engineering work in 2024 and has since become the foundation for agentic engineering training.

Key Characteristics

PRPs are built on three core principles:

  1. Plan before you prompt — Structure thinking before invoking AI
  2. Context is everything — Comprehensive documentation enables quality output
  3. Scope to what the model can reliably do in one pass — Bounded execution units

A complete PRP includes six components:

ComponentPurpose
GoalWhat needs building
WhyBusiness value and impact justification
Success CriteriaStates that indicate completion (not activities)
Health MetricsNon-regression constraints (what must not degrade)
Strategic ContextTrade-offs & priorities (from Product Vision)
All Needed ContextDocumentation references, file paths, code snippets
Implementation BlueprintTask breakdown and pseudocode
Validation LoopMulti-level testing (syntax, unit, integration)

Key Differentiators from Traditional PRDs

ASDLC Usage

PRP components map directly to ASDLC concepts—a case of convergent evolution in agentic development practices.

PRP ComponentASDLC Equivalent
GoalThe Spec — Blueprint
WhyProduct Thinking
Success CriteriaContext Gates
Health MetricsThe Spec — Non-Functional Reqs / Constraints
Strategic ContextProduct Vision — Runtime Injection
All Needed ContextContext Engineering
Implementation BlueprintThe PBI
Validation LoopContext Gates — Quality Gates

In ASDLC terms, a PRP is equivalent to The Spec + The PBI + curated Context Engineering—bundled into a single artifact optimized for agent consumption.

ASDLC separates these concerns for reuse: multiple PBIs reference the same Spec, and context is curated per-task rather than duplicated. For simpler projects or rapid prototyping, the PRP’s unified format may be more practical. The methodologies are complementary—PRPs can be thought of as “collapsed ASDLC artifacts” for single-pass execution.

Applied in:

See also:

Product Thinking

The practice of engineers thinking about user outcomes, business context, and the 'why' before the 'how'—the core human skill in the AI era.

Status: Experimental | Last Updated: 2025-01-05

Definition

Product Thinking is the practice of engineers understanding and prioritizing user outcomes, business context, and the reasoning behind technical work (“why”) before focusing on implementation details (“how”).

Rather than waiting for fully-specified requirements and executing tasks mechanically, product-thinking engineers actively engage with the problem space. They ask:

This mindset originated in product management but has become essential for modern engineering teams, especially as AI increasingly handles implementation while humans must provide strategic judgment.

Key Characteristics

Outcome Orientation Product-thinking engineers measure success by user and business outcomes, not just task completion. They question whether closing a ticket actually moved the product forward.

Context Awareness They understand the broader system: user workflows, business constraints, competitive landscape, and technical debt landscape. Code decisions are made with this context, not in isolation.

Tradeoff Evaluation Every technical decision involves tradeoffs (speed vs maintainability, generality vs simplicity, build vs buy). Product-thinking engineers explicitly identify and evaluate these tradeoffs rather than defaulting to “best practice.”

Ownership Mindset They take responsibility for outcomes, not just implementations. If a feature ships but users don’t adopt it, a product-thinking engineer investigates why, even if the code “worked as specified.”

Risk Recognition They can look at technically correct code and identify product risks: “This will confuse users,” “This locks us into a vendor,” “This creates a support burden.” These risks are invisible to AI.

The AI Era Shift

Matt Watson (5x Founder/CTO, author of Product Driven) argues that vibe coders outperform average engineers not because of superior coding skill, but because they think about the product:

“A lot of engineers? They’re just waiting for requirements. That’s usually a leadership problem. For years, we rewarded engineers for staying in their lane, closing tickets, and not rocking the boat. Then we act surprised when they don’t think like owners.”

The traditional model:

  1. Product Manager writes requirements
  2. Engineer implements requirements
  3. Success = code matches spec

Why this fails in the AI era:

The new competitive advantage:

Watson’s conclusion: “Product thinking isn’t a bonus skill anymore. In an AI world, it’s the job.”

The Leadership Problem

Product thinking doesn’t emerge by accident. Watson identifies the structural cause:

Anti-patterns that kill product thinking:

What builds product thinking:

If every technical decision must flow through a product manager or architect, the organization has created a dependency on human bottlenecks that AI cannot solve.

Applications

Pre-AI Era: Product thinking was a differentiator for senior engineers and those in “full-stack” or startup environments. Most engineers could succeed by executing well-defined requirements.

AI Era: Product thinking becomes the baseline. As AI handles implementation, the human contribution shifts entirely to:

  1. Defining the problem worth solving
  2. Evaluating whether AI-generated solutions actually solve it
  3. Recognizing risks and tradeoffs the model cannot see

Where product thinking is essential:

ASDLC Usage

In ASDLC, product thinking is why Specs exist. The Spec is not bureaucratic overhead—it’s the forcing function that makes product thinking explicit and sharable.

The connection:

When an engineer writes a Spec, they’re forced to answer:

If they can’t answer these questions, they don’t understand the product problem yet. Vibe coding without this foundation produces code that works but solves the wrong problem.

The ASDLC position:

This is the “Instructor-in-the-Cockpit” model: the pilot (AI) flies the plane, but the instructor (human) decides where to fly and evaluates whether the flight is safe.

Applied in:

Best Practices

For Individual Engineers:

  1. Before writing code, write the “why” in plain English
  2. Question requirements that don’t explain user impact
  3. Propose alternatives when you see tradeoff mismatches
  4. Treat AI-generated code skeptically: Does it solve the right problem?

For Engineering Leaders:

  1. Share business context, even when it feels like “too much detail”
  2. Reward engineers who challenge bad requirements, not just those who ship fast
  3. Make “why” documentation non-optional (use Specs or equivalent)
  4. Measure outcomes (user adoption, retention, error rates) not just velocity (story points)

For Organizations:

  1. Flatten decision-making: trust engineers to own tradeoffs in their domain
  2. Train product thinking explicitly (it’s not intuitive for engineers trained to “just code”)
  3. Create feedback loops: engineers see how their code impacts users
  4. Recognize that AI scales implementation, not judgment—invest in the latter

Anti-Patterns

“Just Build It” Culture: Engineers discouraged from asking “why” or proposing alternatives. Leads to technically correct code that solves the wrong problem.

Context Hoarding: Product managers or architects hold all context and dole out tasks. Creates dependency bottleneck and prevents engineers from exercising judgment.

Velocity Worship: Success measured by tickets closed, not problems solved. Optimizes for speed of wrong solutions.

“Stay In Your Lane” Enforcement: Engineers punished for thinking beyond their assigned component. Prevents system-level thinking required for good product decisions.

See also:

Production Readiness Gap

The distance between a working generative AI demo and a secure, scalable production system.

Status: Experimental | Last Updated: 2026-01-26

Definition

The Production Readiness Gap is the distance between “demo works” and “runs securely in production at scale.” This gap represents the validation work required when transitioning Vibe Coded prototypes to production systems.

The gap encompasses:

The Fundamental Asymmetry

Crossing the Production Readiness Gap requires capabilities that LLMs currently lack without structural support:

Demo RequirementsProduction Requirements
Local correctness (this function works)Global correctness (system behaves consistently)
Happy pathAll edge cases, error states, failure modes
Works onceWorks reliably under load, over time
Developer understands itTeam maintains it for years
Acceptable cost for testingSustainable unit economics at scale

“You can’t ship ‘90% correct’ to enterprise customers. You can’t have authentication that works ‘most of the time’ or data integrity that’s ‘pretty good.’” — Dan Cripe

The “Missing Incentive” Test

A useful heuristic for evaluating AI capability claims: Are domain experts doing it?

If autonomous agents could spin up production SaaS with small teams, experienced engineers would be doing it en masse. They’re not. The people claiming it’s possible are typically:

  1. Building personal productivity tools (valid, but not enterprise SaaS)
  2. Running demos that haven’t hit production
  3. Not disclosing how much human intervention (L2/L3) is actually happening

Observability as a Production Requirement

The Production Readiness Gap isn’t just about security, performance, and maintainability—it’s about verifiability in production. If you can’t observe what your code is doing after deployment, you can’t validate that it works.

“The bottleneck shifts from, ‘How fast can I write code?’ to, ‘How fast can I understand what’s happening and make good decisions about it?’” — Charity Majors

AI has made code generation nearly free. The constraint has shifted to understanding and validating what that code does in production. This reframes production readiness:

Old ConstraintNew Constraint
Writing codeUnderstanding code
Testing before deployValidating after deploy
Hope it worksObserve that it works

Without observability, you’re “shipping blind”—deploying code that nobody fully understands, with no feedback loop to validate success. See Feedback Loop Compression for how AI enables tighter observe → validate → learn cycles.

ASDLC Usage

Applied in:

Provenance

The chain of custody and intent behind software artifacts, distinguishing high-value engineered systems from 'slop'.

Status: Experimental | Last Updated: 2026-02-15

Definition

Provenance in the Agentic SDLC is the traceable chain of human intent and verification behind every artifact.

As AI reduces the cost of generating code to near-zero, the value of software shifts from the volume of lines produced to its accountability. Code that appears magically (“vibe coding”) without clear direction has low provenance and acts as a liability. Code that results from specific human intent, articulated in a Spec and verified by a Gate, has high provenance.

The Theory of Value

The “Code is Cheap” philosophy fundamentally alters how we value software engineering activities:

  1. Code is Cheap: LLMs provide an effectively infinite supply of syntax.
  2. Attention is Finite: Human bandwidth to verify and steer is the bottleneck.
  3. Provenance is Value: We value what we can trust. Trust comes from knowing who steered the agent and how it was verified.

“When one gets that big pull request (PR) on an open source repository, irrespective of its quality, if it is handwritten by a human, there is an intrinsic value and empathy for the human time and effort that is likely ascribed to it… That is what makes that code ‘expensive’ and not cheap.” — Kailash Nadh

In an agentic system, we cannot rely on “effort” as a proxy for value. We must rely on provenance—the audit trail that proves a human intended for this code to exist and verified that it serves that intent.

The Spec as “Expensive Talk”

Linus Torvalds famously said, “Talk is cheap. Show me the code.”

In the AI era, this overrides. Code is cheap. Show me the talk.

“The Talk” is the Spec—the high-fidelity articulation of requirements, constraints, and architecture. Generating 10,000 lines of code is trivial; articulating exactly what those 10,000 lines should do is the hard, high-value work.

ASDLC Usage

Provenance is enforced through three mechanisms:

  1. Intent Provenance (The Spec): Every change must trace back to a defined PBI or Spec. No “random acts of coding.”
  2. Verification Provenance (Context Gates): Every state transition is gated by a verifiable check (e.g., “Verified by architect-agent using checklist-v1”).
  3. Audit Provenance (Identity & Tracking): The granular chain of custody showing who did what.
    • Micro-Commits: Granular, step-by-step reasoning rather than a single giant AI slop PR.
    • Identity Separation: When orchestrating autonomous factories, models must operate under distinct, cryptographically isolated credentials (e.g., unique API tokens per agent persona). This ensures that every timeline comment is explicitly attributed to a specific model’s reasoning pathway, aiding in deterministic compliance tracking rather than blending multiple actors into a generic bot-admin account.

Applied in:

Vibe Coding

Natural language code generation without formal specs—powerful for prototyping, problematic for production systems.

Status: Experimental | Last Updated: 2025-01-05

Definition

Vibe Coding is the practice of generating code directly from natural language prompts without formal specifications, schemas, or contracts. Coined by Andrej Karpathy, the term describes an AI-assisted development mode where engineers describe desired functionality conversationally (“make this faster,” “add a login button”), and the LLM produces implementation code.

This approach represents a fundamental shift: instead of writing specifications that constrain implementation, developers describe intent and trust the model to infer the details. The result is rapid iteration—code appears almost as fast as you can articulate what you want.

While vibe coding accelerates prototyping and exploration, it inverts traditional software engineering rigor: the specification emerges after the code, if at all.

The Seduction of Speed

The productivity gains from vibe coding are undeniable:

This velocity is seductive. When a feature that previously took three days can be scaffolded in thirty minutes, the economic pressure to adopt vibe coding becomes overwhelming.

The feedback loop is immediate: describe the behavior, see the code, run it, iterate. For throwaway scripts, MVPs, and rapid exploration, this workflow is transformative.

The Failure Modes

The velocity advantage of vibe coding collapses when code must be maintained, extended, or integrated into production systems:

Technical Debt Accumulation

Forrester Research predicts that by 2026, 75% of technology leaders will face moderate-to-severe technical debt directly attributable to AI-generated code. The mechanism is straightforward: code generated from vague prompts encodes vague assumptions.

When specifications exist only in the prompt history (or the engineer’s head), future maintainers inherit code without contracts. They must reverse-engineer intent from implementation—the exact problem formal specifications solve.

Copy-Paste Culture

2024 marked the first year in industry history where copy-pasted code exceeded refactored code. This is a direct symptom of vibe coding: when generating fresh code is faster than understanding existing code, engineers default to regeneration over refactoring.

Legacy Code in Record Time

As Codurance notes, speed without craftsmanship leads to “Legacy Code in record time.” When AI generates code faster than a human can understand it, the codebase immediately becomes “legacy”—code that developers are afraid to touch because they don’t understand its underlying intent or guarantees.

The result is systemic duplication. The same logic appears in fifteen places with fifteen slightly different implementations, none validated against a shared contract.

Silent Drift

LLMs are probabilistic. When generating code from vibes, they make assumptions:

These assumptions are never documented. The code passes tests (if tests exist), but violates implicit architectural contracts. Over time, the system drifts toward inconsistency—different modules make different assumptions about the same concepts.

Boris Cherny (Principal Engineer, Anthropic; creator of Claude Code) warns: “You want maintainable code sometimes. You want to be very thoughtful about every line sometimes.”

“Speed is seductive. Maintainability is survival.”
— Boris Cherny, The Peterman Podcast (December 2025)

[!NOTE] The 100 Million Token Lesson

Dan Cripe, a 25-year enterprise software veteran, documented spending 100 million tokens on a frontier model attempting to fix its own architectural mistakes—not syntax errors, but fundamental design pattern violations. His diagnosis: “LLMs are pattern matchers, not architects. They generate code that looks like the code they were trained on: code written to solve an immediate problem, not code designed to be maintainable as part of a larger system.”

Vibe Coded Into a Corner

Anthropic’s internal research found that engineers who spend more time on Claude-assisted tasks often do so because they “vibe code themselves into a corner”—generating code without specs until debugging and cleanup overhead exceeds the initial velocity gains.

“When producing output is so easy and fast, it gets harder and harder to actually take the time to learn something.” — Anthropic engineer

This creates a debt spiral: vibe coding is fast until it isn’t, and by then the context needed to fix issues was never documented.

Regression to the Mean

Without deterministic constraints, LLMs trend toward generic solutions. Vibe coding produces code that works but lacks the specific optimizations, domain constraints, and architectural decisions that distinguish production systems from prototypes.

The model doesn’t know that “user IDs must never be logged” or “this cache must invalidate within 100ms.” These constraints exist in specifications, not prompts.

Applications

Vibe coding is particularly effective in specific contexts:

Rapid Prototyping: When validating product hypotheses, speed of iteration outweighs code quality. Vibe coding enables designers and product managers to generate functional prototypes without deep programming knowledge.

Throwaway Scripts: One-off data migrations, analysis scripts, and temporary tooling benefit from vibe coding’s velocity. Since the code has no maintenance burden, formal specifications are unnecessary overhead.

Learning and Exploration: When experimenting with new APIs, frameworks, or architectural patterns, vibe coding provides immediate feedback. The goal is understanding, not production-ready code.

Greenfield MVPs: Early-stage startups building minimum viable products often prioritize speed-to-market over maintainability. Vibe coding accelerates this phase, though technical debt must be managed during the transition to production.

ASDLC Usage

In ASDLC, vibe coding is recognized as a legitimate operational mode for bounded contexts (exploration, prototyping, throwaway code). However, for production systems, ASDLC mandates a transition to deterministic development.

The ASDLC position:

Applied in:

See also: