ASDLC.io
Article Compendium

Alphabetic Listing for Programmatic Access

Generated: 2026-03-09

This is an alphabetically-sorted compilation of all ASDLC articles (Concepts, Patterns, Practices) in a single page. This resource is optimized for bulk download, scraping, or feeding to LLMs for comprehensive analysis.

Unlike the Field Manual, which is organized for readability and follows a curated structure, this compendium is intentionally unstructured—a raw alphabetic listing for programmatic consumption.

Concepts (A-Z)

Agent Skills

Open standard for packaging procedural knowledge and workflows, serving as the 'how-to' layer for AI agents.

Status: Draft | Last Updated: 2026-03-02

Agent Skills

Definition

Agent Skills (often just “Skills”) are an open standard for packaging procedural knowledge, instructions, and workflows into highly reusable, discoverable formats that AI agents can utilize on demand.

While the Model Context Protocol (MCP) connects agents to external tools and data, Skills teach the agent how to use those tools to accomplish specific tasks. They fill the gap between declarative knowledge (what a system is) and procedural knowledge (how to operate within it).

Originally developed by Anthropic for Claude Code and subsequently opened as an industry standard (Agent Skills spec), they are now supported across major agent environments including OpenAI’s Codex CLI, Google’s Antigravity, and specialized repositories like Vercel’s skills.sh.

The Anatomy of a Skill

A Skill is fundamentally a directory containing a SKILL.md file, paired with optional executable scripts or reference documents.

Structure Example:

my-code-review-skill/
├── SKILL.md       # Required: Frontmatter metadata + Markdown instructions
├── scripts/       # Optional: Executable helpers (bash, python, etc.)
├── references/    # Optional: Detailed documentation loaded on demand
└── assets/        # Optional: Templates, examples, or schemas

Progressive Disclosure

Skills operate on a progressive disclosure model. This is a practical implementation of the Toolchain as Context Reduction principle:

Discovery: The agent loads only the YAML frontmatter (name, description, compatibility) into its base context. This consumes minimal tokens (~100 tokens).
Activation: If the agent encounters a task matching the skill’s description, it invokes the skill, loading the body of SKILL.md (< 5k tokens).
Execution: The agent follows the instructions and explicitly loads any heavy reference files or executes helper scripts only when the workflow demands it.

The ASDLC Perspective

In the Agentic SDLC, Skills hold a specific place in the artifact taxonomy:

Artifact	Scope	Lifespan	Content Type
`AGENTS.md`	Project-global	Persistent	Behavioral / Judgment
Specs	Feature-level	Permanent (living)	Contracts / Design
ADRs	Decision-scoped	Immutable	Rationale / Consequences
PBIs	Task-scoped	Transient	Delta / Execution
Skills	Workflow-scoped	Reusable	Procedural knowledge
MCP	Integration-scoped	Runtime	Connectivity / Tools

Skills as Persona Carriers

A core ASDLC pattern asserts that personas should be injected via workflows, not loaded globally. Skills are the natural packaging format for this pattern. Instead of bloating the global AGENTS.md with instructions on how to act as a “Database Reviewer” or a “UI/UX Specialist,” those personas are encapsulated as Skills and invoked only when the relevant tasks arise.

Horizontal vs. Vertical Context

The choice between putting instructions in AGENTS.md versus creating a Skill comes down to horizontal versus vertical context.

In Vercel’s 2025 agent evaluations, they discovered that for foundational framework knowledge (like Next.js routing patterns), a compressed static index embedded in AGENTS.md outperformed Skills (100% vs 79% pass rate). Why? Because Skills require an active decision by the agent to invoke them. When a rule applies to everything the agent does in a project, it belongs in the horizontal, always-on context of AGENTS.md.

The Heuristic:

Use AGENTS.md for global constraints, repository architecture, and fundamental framework rules.
Use Skills for deep, task-specific, procedural workflows (e.g., “How to execute our deployment pipeline” or “How to perform an adversarial code review”).

Skills vs. MCP

The distinction is simple: If Skills teach an agent to cook, MCP provides the instruments.

However, a real architectural tension exists between them. Armin Ronacher (2025) noted that teams often prefer Skills over “Dynamic MCP Loadouts” because MCP servers are maintained by third parties and suffer from API instability. If an MCP server changes its schema, the agent breaks. Alternatively, a Skill containing a local bash script to query a database is entirely under the team’s version control.

When absolute local control and token efficiency are paramount, teams may choose to implement capabilities via Skills (using local shell actions) rather than relying on external MCP servers.

Agentic SDLC

Framework for industrializing software development where agents serve as the logistic layer while humans design, govern, and optimize the flow.

Status: Live | Last Updated: 2026-01-01

Definition

The Agentic Software Development Life Cycle (ASDLC) is a framework for industrializing software engineering. It represents the shift from craft-based development (individual artisans, manual tooling, implicit knowledge) to industrial-scale production (standardized processes, agent orchestration, deterministic protocols).

“Agentic architecture is the conveyor belt for knowledge work.” — Ville Takanen

ASDLC is not about “AI coding assistants” that make developers 10% faster. It’s about building the software factory—systems where agents serve as the architecture of labor while humans design, govern, and optimize the flow.

The Industrial Thesis

Agents do not replace humans; they industrialize execution.

Just as robotic arms automate welding without replacing manufacturing expertise, agents automate high-friction parts of knowledge work (logistics, syntax, verification) while humans focus on intent, architecture, and governance.

In this model:

Agents are the logistic layer — Moving information, verifying specs, executing tests
Context is the supply chain — Just-in-Time delivery of requirements, schemas, and code
Standardization is mandatory — Schemas, typed interfaces, deterministic protocols replace “vibes”
AI Amplification — Agents act as a “High-Pass Filter” for process maturity: they accelerate good practices but amplify the chaos of bad ones.
Provenance is Value — In a world of infinite code, value shifts to Provenance: the audit trail of who steered the agent and how it was verified.

The Cybernetic Model

ASDLC operates at L3 Conditional Autonomy—a “Fighter Jet” model where the Agent acts as the Pilot executing maneuvers, and the Human acts as the Instructor-in-the-Cockpit.

Key Insight: Compute is cheap, but novelty and correctness are expensive. Agents naturally drift toward the “average” solution (Regression to the Mean). The Instructor’s role is not to write code, but to define failure boundaries (Determinism) and inject strategic intent (Steering) that guides agents out of mediocrity.

The Cybernetic Loop

The lifecycle replaces the linear CI/CD pipeline with a high-frequency feedback loop:

Mission Definition: The Instructor defines the “Objective Packet” (Intent + Constraints). This is the core of Context Engineering.

Generation (The Maneuver): The Agent autonomously maps context—often using the Model Context Protocol (MCP) to fetch live data—and executes the task.

Verification (The Sim): Automated Gates check for technical correctness (deterministic), while the Agent’s Constitution steers semantic intent (probabilistic).

Course Correction (HITL): The Instructor intervenes on technically correct but “generic” solutions to enforce architectural novelty.

Strategic Pillars

Factory Architecture (Orchestration)

Projects structured with agents as connective tissue, moving from monolithic context windows to discrete, specialized stations (Planning, Spec-Definition, Implementation, Review).

Standardized Parts (Determinism)

Schema-First Development where agents fulfill contracts, not guesses. AGENTS.md, specs/, and strict linting serve as the “jigs” and “molds” that constrain agent output.

Quality Control (Governance)

Automated, rigorous inspection through Probabilistic Unit Tests and Human-in-the-Loop (HITL) gates. Trust the process, not just the output.

The Agent Factory (Meta-Optimization)

The underlying machinery that builds the agents themselves. While the Ralph Loop produces code, the Agent Optimization Loop produces better agents by testing them against Scenarios rather than static benchmarks.

ASDLC Usage

Full project vision: /docs/vision.md

Applied in: Specs, AGENTS.md Specification, Context Gates, Model Routing, Agent Optimization Loop

AI Amplification

The principle that AI tools amplify existing engineering practices—making disciplined teams faster and chaotic teams fail sooner.

Status: Draft | Last Updated: 2026-02-12

Definition

AI Amplification is the observation that Artificial Intelligence acts as a multiplier of an organization’s existing engineering maturity, rather than a corrective force.

Coined by Bryan Finster as the “High-Pass Filter” effect, it dictates:

Good Process + AI = Exponential Velocity because the constraints guide the generation.
Bad Process + AI = Exponential Technical Debt because the flaws are generated faster than they can be caught.

“If your architecture is a tangled spaghetti of ad-hoc decisions, AI will happily generate more spaghetti… Garbage in, garbage out—now at machine speed.” — Raf Lefever

The Mechanism

AI lowers the marginal cost of code generation to near zero.

If your development process relies on Architecture, Specs, and TDD, the AI generates code that fits those structures. The constraints are “high-pass filters” that block bad code but let good code through fast.
If your process relies on Ad-Hoc Changes and Manual Testing, the AI generates complexity that overwhelms your manual gates. Without the filter, the noise (bugs, drift, debt) is amplified.

ASDLC Usage

This concept is the foundational “Why” behind the Agentic SDLC.

We build the “Factory” (Specs, Context Gates, Adversarial Reviews) before we turn on the machines, because turning on the machines in an empty field just creates chaos faster.

See also: Vibe Coding (the result of amplification without constraints).

AI Software Factory

An industrial-scale approach to software engineering. Explores the dichotomy between Safe ASDLC Factories (L3) and high-risk Dark Factories (L4).

Status: Experimental | Last Updated: 2026-03-09

Definition

The Software Factory is a concept inherited from DevOps and manufacturing that models software development as an industrial assembly line rather than a bespoke craft.

In the agentic era, an AI Software Factory uses autonomous agents to automate the “run the business” toil (technical debt, dependency updates, bug fixes, operational overhead). By industrializing these tasks, human capital is shifted toward high-level creative architecture, problem-solving, and system design.

The Dichotomy: L3 vs L4 Factories

As organizations attempt to eliminate human bottlenecks, two distinct operational modes have emerged:

1. The Safe Factory (The ASDLC Model)

This model operates at Level 3 (Conditional) Autonomy. Agents act as the high-throughput generation engine on the assembly line, but humans retain the ultimate verification controls.

Driven by rigorous Spec-Anchored Development, human engineers define the architecture and act as the final Acceptance Gate. Crucially, the human elevates from Code Auditor to Change Owner. By relying on automated Quality Gates and agentic Review Gates for line-by-line syntax and specification checks, the human focuses PR reviews on the structural footprint (“what files changed?”) and strategic fitness (“does this solve the problem safely?”). They approve state transitions and maintain complete Provenance over what enters production without becoming a bottleneck.

2. The Dark Factory (L4 Model)

In this model, “Code must not be written by humans. Code must not be reviewed by humans.” The lights are out because nobody needs to see.

Because deterministic human code review is eliminated entirely, Dark Factories must substitute it with Probabilistic Satisfaction. Quality is measured empirically: Of all the observed trajectories through thousands of holdout test scenarios, what fraction satisfy the user?

To achieve this testing scale without exhausting API rate limits or incurring massive vendor costs, Dark Factories utilize Digital Twins—high-fidelity, in-memory clones of required third-party services (e.g., Slack, Stripe, Jira).

ASDLC Position & Governance Risks

ASDLC standardizes heavily around the L3 Safe Factory. We consider the L4 Dark Factory to be an experimental, high-risk frontier that introduces unpriced regulatory exposure.

While the technical hurdles of eliminating human review are actively being solved by Digital Twins and multi-agent synthesis, taking humans out of the code review loop entirely introduces severe Governance Risks:

Silent Drift: Without constant file-level human intervention, the codebase functions technically against its tests but gradually deteriorates into an unmaintainable architectural state over months.
The Liability Gap: If a silently agent-deployed module fails, it is legally unclear who is liable: the architects who wrote the spec, or the AI provider who supplied the base model?
The Disclosure Gap: Currently, no industry standard exists for auditing “agent-built software tested probabilistically against replicas.” Disclosing this to enterprise procurement officers is practically useless without a shared evaluation framework.
The Contractual Gap: Vendors operating Dark Factories often still use “AS IS” limitation-of-liability boilerplate. A contract designed to disclaim human imperfection is inappropriately absorbing the risk of the complete absence of a human process, destroying trust.

Architecture Decision Record

A lightweight document that captures a significant architectural decision, its context, and consequences at a specific point in time.

Status: Live | Last Updated: 2026-01-28

Definition

An Architecture Decision Record (ADR) is a document that captures a significant architectural decision along with its context, rationale, and consequences. Unlike living documentation that evolves with the codebase, ADRs are immutable snapshots—they record what was decided and why at a specific moment in time.

The format was introduced by Michael Nygard in 2011 as a lightweight alternative to heavyweight architecture documentation. Each ADR addresses exactly one decision, making the record atomic and traceable.

Key Characteristics

Immutability

ADRs are not updated when circumstances change. Instead, a new ADR is created that supersedes the original. This preserves the archaeological record of how architectural thinking evolved.

Lightweight

A single ADR fits on one or two pages. The format resists the temptation to over-document, focusing only on the decision and its immediate context.

Decision-Focused

ADRs capture decisions, not designs or implementations. The question answered is “What did we decide?” not “How does it work?” (that belongs in specs) or “How do I build it?” (that belongs in implementation guides).

Contextual

Every ADR includes the forces and constraints that shaped the decision. This context is critical—a decision that seems wrong in isolation often makes sense when the original constraints are understood.

Standard Sections

The canonical ADR format includes:

Section	Purpose
Title	Short name with ID (e.g., “ADR-001: Use PostgreSQL for Primary Database”)
Status	Lifecycle state: Proposed, Accepted, Deprecated, Superseded by ADR-XXX
Context	What forces are at play? What problem needs solving?
Decision	What was decided?
Consequences	Positive, negative, and neutral outcomes of this decision
Alternatives Considered	What other options were evaluated and why they were rejected?

ASDLC Usage

In ASDLC, ADRs serve as high-value context for agents. When an agent works on authentication, knowing “ADR-003: Chose Supabase Auth over Firebase Auth” provides essential architectural constraints.

ADRs may also evolve into Agent Constitution rules—an ADR stating “All database migrations must be backward-compatible” becomes a constitutional constraint that agents must not violate.

Applied in:

The ADR — Structural pattern for ADR anatomy
ADR Authoring — Practical guide for writing ADRs
Request for Comments — Related concept for grouped proposals

See also:

The Spec — Living documentation that ADRs inform but do not replace
Context Engineering — ADRs as context sources for agents

Behavior-Driven Development

A collaborative specification methodology that defines system behavior in natural language scenarios, bridging business intent and machine-verifiable acceptance criteria.

Status: Live | Last Updated: 2026-01-13

Definition

Behavior-Driven Development (BDD) is a collaborative specification methodology that defines system behavior in natural language scenarios. It synthesizes Test-Driven Development (TDD) and Acceptance Test-Driven Development (ATDD), emphasizing the “Five Whys” principle: every user story should trace to a business outcome.

The key evolution from testing to BDD is the shift from “test” to “specification.” Tests verify correctness; specifications define expected behavior. In agentic workflows, this distinction matters because agents need to understand what behavior is expected, not just what code to write.

Key Characteristics

From Tests to Specifications of Behavior

Aspect	Unit Testing (TDD)	Behavior-Driven Development
Primary Focus	Correctness of code at unit level	System behavior from user perspective
Language	Code-based (Python, Java, etc.)	Natural language (Gherkin)
Stakeholders	Developers	Developers, QA, Business Analysts, POs
Signal	Pass/Fail on logic	Alignment with business objectives
Agent Role	Minimal (code generation)	Central (agent interprets and executes behavior)

The Three Roles in BDD

BDD emphasizes collaboration between three perspectives:

Business — Defines the “what” and “why” (business value, user outcomes)
Development — Defines the “how” (implementation approach)
Quality — Defines the “proof” (verification criteria)

In agentic development, the AI agent often handles Development while Business and Quality remain human-defined. BDD provides the structured handoff format.

BDD in the Probabilistic Era

Traditional BDD was designed for deterministic systems: given specific inputs, expect specific outputs. Agentic systems are probabilistic—LLM outputs vary based on context, temperature, and emergent behavior.

BDD adapts to this by:

Defining behavioral contracts rather than implementation details
Allowing agents to determine how to achieve specified behavior
Providing semantic anchors that constrain the reasoning space without over-specifying

ASDLC Usage

BDD’s value in agentic development is semantic anchoring. When an agent is given a Gherkin scenario, it receives a “specification of behavior” that:

Partitions the reasoning space into manageable segments (Given/When/Then)
Defines success criteria without over-specifying implementation
Aligns technical execution with business intent

This is why BDD scenarios belong in Specs, not just test suites. They’re not just verification artifacts—they’re functional blueprints that guide agent reasoning.

Implementation via the Spec Pattern:

BDD Component	Spec Implementation
Feature description	Spec Context section
Business rules	Blueprint constraints
Acceptance scenarios	Contract section (Gherkin scenarios)

Applied in:

The Spec — Implements BDD through Blueprint (constraints) and Contract (scenarios)
Context Gates — BDD scenarios define verification criteria at gates

Context Anchoring

The phenomenon where explicit context biases an LLM toward specific concepts or solutions, even when marked as deprecated or irrelevant to the immediate task.

Status: Live | Last Updated: 2026-02-24

Definition

Context Anchoring is a cognitive bias—well-documented in human psychology and highly prevalent in Large Language Models (LLMs)—where an initial piece of information heavily influences subsequent reasoning and decision-making.

In the domain of AI-assisted software development, Context Anchoring occurs when explicit information provided in a prompt or a context file (like AGENTS.md) biases the model toward specific architectural patterns, libraries, or solutions, often to the detriment of the actual task. This is colloquially referred to as the “Pink Elephant Problem”: telling an LLM not to think about a specific implementation detail ensures that the concept is front-and-center in its attention mechanism.

The Pink Elephant Problem

LLMs are probabilistic, next-token prediction engines. They do not possess a human understanding of negation or deprecation in the same way they understand presence.

If a project’s AGENTS.md file contains the line: “We use tRPC on the backend (Note: legacy endpoints only, new work uses GraphQL),” the model now has the token tRPC active in its context window for every subsequent prompt.

Because the LLM’s attention mechanism assigns weight to explicitly named entities, the agent is statistically more likely to reach for or reference tRPC, even when instructed to build a new feature. The LLM struggles to distinguish between “this is a historical fact about the codebase” and “this is a relevant instruction for the current task.” You said it, so it is there, competing for attention.

Key Characteristics

The impact of Context Anchoring manifests in three distinct failure modes:

Diluted Attention: Every line of context placed in the prompt competes with the actual objective. Research natively shows that as context length increases, task performance degrades even when the added information is perfectly relevant.
The “Lost in the Middle” Effect: Crucial instructions can be ignored or hallucinated over if they are surrounded by dense, anchored noise (like a comprehensive explanation of a legacy directory structure).
Hyper-fixation: The agent fixates on a specific tool, file, or pattern mentioned in the context (the “anchor”), attempting to wedge it into solutions where it does not belong.

The Diagnostic Inversion

A common anti-pattern in Agentic SDLC is treating context files as a persistent band-aid for codebase friction. If an agent consistently struggles to import the correct utility function, a developer’s instinct is to add an explicit instruction to the context: “Always import from src/utils/core, not src/utils/legacy.”

While this might solve the immediate problem, it introduces an anchor. The correct mental model is to treat AGENTS.md as a diagnostic tool. Every instruction added to steer the agent away from a mistake is a signal of structural friction in the codebase.

The ideal response is not to expand the context file, but to fix the underlying ambiguity—for example, by actually deleting the legacy utilities, reorganizing the directory structure, or adding a linter rule. Once the structural friction is resolved, the anchoring instruction should be deleted.

ASDLC Usage

In ASDLC, understanding Context Anchoring drives our philosophy of extreme constraint minimalism.

Applied in:

Context Engineering (as the reason why Toolchains serve as vital Context Reduction)
AGENTS.md Specification (enforcing the “Minimal by Design” philosophy)
Context Map (separating navigation from ingestion to avoid overloading the context window)

Context Engineering

Context Engineering is the practice of structuring information to optimize LLM comprehension and output quality.

Status: Live | Last Updated: 2026-02-24

Definition

Context Engineering is the systematic approach to designing and structuring the input context provided to Large Language Models (LLMs) to maximize their effectiveness, accuracy, and reliability in generating outputs.

The practice emerged from the recognition that LLMs operate on explicit information only—they cannot intuit missing business logic or infer unstated constraints. Context Engineering addresses this by making tacit knowledge explicit, machine-readable, and version-controlled.

The Requirements Gap

“Prompt Engineering” is often a misnomer. It is simply Requirements Engineering adapted for a probabilistic system. Unlike a human developer who asks clarifying questions when requirements are vague (“What happens if the payment fails?”), an LLM generates the statistically most likely continuation based on its training data. It does not “understand” the business domain; it predicts patterns. When explicit logic is missing, the model defaults to the average case found in its training set, leading to code that is syntactically correct but semantically misaligned with specific project needs.

The Cold Start Problem

Martin Fowler observes: “As I listen to people who are serious with AI-assisted programming, the crucial thing I hear is managing context.”

Anthropic’s research confirms this. Engineers cite the cold start problem as the biggest blocker:

“There is a lot of intrinsic information that I just have about how my team’s code base works that Claude will not have by default… I could spend time trying to iterate on the perfect prompt [but] I’m just going to go and do it myself.”

Context Engineering solves cold start by capturing this intrinsic information in files the agent can read.

Key Characteristics

Version Controlled: Context exists as a software asset that lives in the repo, is diffed in PRs, and is subject to peer review.
Standardized: Formatted to be readable by any agent (Cursor, Windsurf, Devin, GitHub Copilot).
Iterative: Continuously refined based on agent failure modes and tacit information discovered by Human-in-the-loop (HITL) workflows.
Schema-First: Data structures defined before requesting content generation to ensure type safety and validation.
Hierarchical: Information organized by importance—critical instructions first, references second, examples last.

Applications

While ASDLC focuses on software development, Context Engineering is domain-agnostic:

In Design: Design system tokens and Figma layer naming conventions fed to UI agents
In Law: Briefs restricting paralegal agents to specific case law precedents
In SDLC: The AGENTS.md file steering agents toward implementation patterns

Screaming Architecture

Context Engineering extends to the filesystem itself. As Raf Lefever notes, “If your code-base doesn’t scream its domain, AI will whisper nonsense.”

A well-structured filesystem (e.g., src/features/checkout/core-logic) provides implicit context to the LLM about intent and boundaries. A generic filesystem (src/utils, src/managers) forces the LLM to guess. In ASDLC, we optimize directory structures to be “training wheels” for the agent.

Toolchain as Context Reduction

Context Engineering is typically framed as a question of what to put in context. Equally important is what to leave out.

Every constraint enforced deterministically by the toolchain is context that does not need to be in the prompt. A well-configured biome.json silently eliminates an entire class of style instructions. A strict tsconfig.json makes type safety rules unnecessary to state. Treat your linter, formatter, and type checker configurations as upstream context engineering — they narrow the solution space before the agent ever sees the prompt.

This principle has empirical support. Gloaguen et al. (2026) found that agents follow context file instructions faithfully, which means unnecessary instructions impose a real cost: broader exploration, more reasoning tokens, higher inference cost — without improving task outcomes. The implication is that bloated context files are not neutral; they are actively harmful.

Furthermore, agents are highly susceptible to Context Anchoring. Telling an LLM what not to do ensures that the concept is front-and-center in its attention mechanism. If your AGENTS.md says “do not use tRPC”, the agent might still reach for it because the token tRPC is highly active in the context window.

The decision hierarchy for any constraint:

Can a runtime gate enforce it? → Use the gate
Can a toolchain config enforce it? → Use the config
Neither? → It belongs in context

Multi-Layer Action Spaces and Economics

The cost and latency of agent orchestration scale directly with context size. As agents take on larger tasks, explicit definition of massive MCP (Model Context Protocol) toolsets bloats the context window.

The Solution: Push actions from the tool-calling layer to the OS layer. By equipping agents with a basic “Virtual Computer” (shell and filesystem access), they can interact with command-line utilities implicitly rather than parsing dozens of explicit JSON schema definitions. This action space offloading dramatically improves the economics of “Prompt Caching,” making high-capacity agent loops viable.

The “Learned Context Management” Fallacy

Some theories suggest that “The Bitter Lesson” applies to context—that as foundational models scale, they will natively learn to manage their own memory streams, rendering explicit file-centric state and Context Gates obsolete.

In ASDLC, we dispute this. Relying on a probabilistic model’s native “attention mechanism” to remember a critical business constraint from 30 turns ago is a regression to “Vibe Coding.” Explicit, deterministically structured context ensures the system fulfills contracts, rather than drifting on the model’s statistical average.

Distinctions

Context vs Guardrails

A distinction exists between Guardrails (Safety) and Context (Utility). Currently, many AGENTS.md files contain defensive instructions like “Do not delete files outside this directory” or “Do not output raw secrets.” This is likely a transitional state. OpenAI, Anthropic, Google, and platform wrappers are racing to bake these safety constraints directly into the inference layer. Soon, telling an agent “Don’t leak API keys” will be as redundant as telling a compiler “Optimize for speed.”

ASDLC Usage

In ASDLC, context is treated as version-controlled code, not ephemeral prompts.

Applied in:

AGENTS.md Specification — The practical application of context engineering in repositories.
Model Context Protocol — The standard for serving context to agents.

Related Patterns:

Specs — Specs are context engineering in document form.
Context Gates — Checkpoints where context is validated.
Context Map — The structural pattern for organizing context.
Agent Optimization Loop — Verifying context quality.
Context Offloading — Operational practice for preventing context rot.

[!NOTE] Research Validation (InfiAgent, 2026): File-centric state management outperforms compressed long-context prompts. Replacing persistent file state with accumulated conversation history dropped task completion from 80/80 to 27.7/80 average, even with Claude 4.5 Sonnet. This validates treating context as a reconstructed view of authoritative file state, not as conversation memory.

Coverage Metric

Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.

Status: Draft | Last Updated: 2026-01-10

Definition

Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.

Unlike quality metrics (correctness, style, performance), coverage answers: “Did it finish the job?”

Key Characteristics

Formula

Coverage = (Completed Units / Total Required Units) × 100

Where “units” are task-appropriate:

Literature review: papers processed
Code migration: files converted
Test generation: functions covered
Data processing: records handled

Why Coverage Matters

Quality metrics assume the agent attempted the work. But long-horizon agents often fail silently:

Early termination without completing all items
Skipping items without acknowledgment
Producing empty or metadata-only outputs

Coverage catches these failures that quality metrics miss.

Measurement

Report three values across multiple runs:

Max — Best-case completion
Min — Worst-case completion
Avg — Expected completion

High variance (large gap between max and min) indicates unreliable architecture, even if max is perfect.

ASDLC Usage

Coverage is particularly relevant for:

Batch PBI execution — Did the agent complete all subtasks?
Migration tasks — Were all files processed?
Review Gates — Did the Critic review all flagged items?

Consider adding coverage assertions to Quality Gates for batch operations.

Digital Twins

Virtual replicas of complex systems. In software engineering, these are behavioral clones of third-party services used for high-volume scenario testing.

Status: Experimental | Last Updated: 2026-03-09

Definition

A Digital Twin is a virtual representation that serves as the real-time digital counterpart of a physical object or process. Originating in aerospace packaging and popularized by IoT and manufacturing (e.g., simulating jet engines or assembly lines before deployment), the concept revolves around creating high-fidelity simulated replicas for experimentation and monitoring.

Key Characteristics

Fidelity: The twin must accurately mimic the behaviors, edge-cases, and interfaces of the original system.
Isolation: Actions performed on the twin must not impact the real-world counterpart.
Volume: Twins allow for thousands of destructive or exploratory simulations that would be impossible or prohibitively expensive on the real system.

ASDLC Usage

In agentic software development, the “physical system” being replicated is often external software itself. ASDLC applies the Digital Twin concept to third-party services and integrations (e.g., Slack, Okta, Stripe, Jira).

As organizations move toward AI Software Factories, they replace human testing with thousands of automated, probabilistic test scenarios executed by agents.

The API Bottleneck

Running these massive volumes of agentic integration tests against real SaaS tools frequently triggers fatal issues:

Rate Limits: Hitting 429 Too Many Requests instantly.
Vendor Cost: Accumulating massive API usage bills.
Abuse Flagging: Triggering the vendor’s fraud/abuse systems based on strange, high-volume automated behavior during fuzzing or agent-optimization runs.

The Agentic Solution

Instead of traditional mocking (which is often brittle and out-of-date), developers use AI agents to dynamically generate working, in-memory Digital Twin Universes (DTU).

If you provide a base model with an API Spec and documentation for Stripe, the agent can generate a lightweight, local web server that perfectly mirrors the Stripe API’s expected inputs, state changes, and outputs.

Agents are then pointed at the Digital Twin local endpoint instead of the production API. This enables infinite, zero-cost, zero-latency integration testing—a prerequisite for sustained Level 4 Autonomy where probabilistic evaluation is the only quality gate.

These Digital Twins live entirely in memory or local execution environments. They are functionally identical to the external service from the perspective of the application, but execute instantly without network overhead.

“With their own, independent clones of those services… their army of simulated testers could go wild. Their scenario tests became scripts for agents to constantly execute against the new systems as they were being built.” — Simon Willison

When combined with Holdout Scenarios, Digital Twins provide the bedrock for executing probabilistic assurance at a scale that replaces manual human testing.

Event Modeling

A system blueprinting method that centers on events as the primary source of truth, serving as a rigorous bridge between visual design and technical implementation.

Status: Experimental | Last Updated: 2026-01-01

Definition

Event Modeling is a method for designing information systems by mapping what happens over time. It creates a linear blueprint that serves as the single source of truth for Product, Design, and Engineering.

Unlike static diagrams (like ERDs or UML) that focus on structure, Event Modeling focuses on the narrative of the system. It visualizes the system as a film strip, showing exactly how a user’s action impacts the system state and what information is displayed back to them.

Core Components

An Event Model is composed of four distinct elements:

Commands (Blue): The intent or action initiated by the user (e.g., “Submit Order”).
Events (Orange): A fact recorded by the system that cannot be changed (e.g., “OrderPlaced”). This is the single source of truth.
Views (Green): Information displayed to the user, derived from previous events (e.g., “Order Confirmation Screen”).
Processes: The logic or automation that reacts to events to trigger other commands or update views.

Why It Matters for AI

In modern software development, ambiguity is the enemy. While human engineers can infer intent from a loose visual mockup, AI models require explicit instructions.

Event Modeling forces implicit business rules to become explicit. By defining the exact data payload of every Command and the resulting state change of every Event, we provide AI agents with a deterministic roadmap. This ensures the generated code handles edge cases and data consistency correctly, rather than just “looking right” on the frontend.

Relationship to Requirements

Event Modeling acts as a bridge between Visual Design (what it looks like) and Technical Architecture (how it works).

It does not replace functional requirements; rather, it validates them. A feature is only considered “defined” when there is a complete path mapped from the user’s view, through the command, to the stored event, and back to the view. This “closed loop” guarantees that every pixel on the screen is backed by real data.

Extreme Programming

A software development methodology emphasizing high-frequency feedback, testing, and continuous refactoring, which maps perfectly to the Agentic SDLC.

Status: Live | Last Updated: 2026-02-24

Definition

Extreme Programming (XP) is a software development methodology intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent “releases” in short development cycles, aiming to improve productivity and introduce checkpoints where new customer requirements can be adopted.

Originally formalized by Kent Beck in the late 1990s, XP takes recognized “good” practices—such as testing, review, and integration—and pushes them to “extreme” levels. If testing is good, everyone will test all the time (TDD). If code reviews are good, we will review code all the time (Pair Programming). If design is good, we will make it part of everybody’s daily business (Continuous Refactoring).

Key Characteristics

Traditional Extreme Programming relies on several core engineering disciplines:

Test-Driven Development (TDD): Writing automated tests before writing the implementation code to define exact behavior boundaries.
Pair Programming: Two developers working at a single workstation—one writing code (the Driver), the other reviewing each line as it is typed (the Navigator).
Continuous Refactoring: Relentlessly improving the internal structure of the code without changing its external behavior to manage technical debt.
Continuous Integration (CI): Integrating and testing the system many times a day to prevent “integration hell.”
Small Releases: Deploying minimal viable increments frequently to validate assumptions.

The Agentic Transmutation

Many agile frameworks (like Scrum) emphasize human-centric ceremonies (sprints, stand-ups, planning poker) that are difficult to translate into machine execution. Extreme Programming, by contrast, is fundamentally engineering-driven. Because XP focuses on structural rigor, continuous validation, and high-frequency feedback loops, it maps perfectly to the Agentic Software Development Life Cycle (ASDLC).

In ASDLC, we do not abandon XP; we industrialize it by replacing human labor with agentic execution in the high-friction logistics layers.

1. TDD $\rightarrow$ Probabilistic Unit Testing & Context Gates

In traditional XP, humans write unit tests to catch human regressions. In ASDLC, humans write tests to constrain agent hallucination. The tests become the strict, deterministic boundaries (Context Gates) that verify whether the probabilistic model (the Agent) successfully adhered to the (Spec). TDD is no longer a best practice; it is the mandatory safety harness for autonomous code generation.

2. Pair Programming $\rightarrow$ The Pilot & Instructor Model

The classic Driver/Navigator dynamic of pair programming is perfectly preserved in the ASDLC, but the roles are specialized. The Agent acts as the Driver (writing the boilerplate, executing refactors, generating the syntax), while the Human acts as the Navigator or Instructor (reviewing structural integrity, managing context, and steering the architecture). The human no longer types; they govern the trajectory.

3. Continuous Integration $\rightarrow$ The Cybernetic Loop

Agents do not suffer from fatigue context-switching. Therefore, the “Continuous Integration” of XP becomes a literal Cybernetic Loop, wherein agents merge, test, and validate micro-commits continuously.

4. Continuous Refactoring $\rightarrow$ Agent Optimization Loops

While agents refactor application code to manage technical debt, the overarching system refactors the agents themselves. The Agent Optimization Loop continuously tests agents against Scenarios to refine their underlying prompts and instructions (e.g., distilling their AGENTS.md context files) based on failure rates.

ASDLC Usage

In ASDLC, Extreme Programming is not a historical artifact; it is the philosophical engine driving how we structure agentic behavior.

Applied in:

Feedback Loop Compression

How AI compresses the observe → validate → learn cycle, shifting the bottleneck from code production to code understanding.

Status: Experimental | Last Updated: 2026-01-26

Definition

Feedback Loop Compression is the phenomenon where AI collapses the time between deploying code and understanding its production behavior. For 20 years, DevOps attempted to connect developers with production consequences but failed—the loops were “long, lossy, and laggy.” AI has changed this.

“The bottleneck shifts from, ‘How fast can I write code?’ to, ‘How fast can I understand what’s happening and make good decisions about it?’” — Charity Majors

The compression is asymmetric: AI has made the Act phase (code generation) nearly free, while the Orient phase (understanding production state) remains the constraint. Feedback Loop Compression addresses this by making observation and validation as fast as generation.

The Shift in Constraints

Era	Primary Bottleneck	Secondary Bottleneck
Pre-DevOps	Deployment (ops owns production)	Feedback (weeks-to-months)
DevOps Era	Feedback loops (still too slow)	Code production
AI Era	Understanding & validation	Code production → near-zero

The traditional workflow optimized for the wrong constraint:

Old: write code → test → review → merge → "hope it works!"
New: write code (AI) → deploy → observe → validate → learn → iterate

In the new model, every deploy is a learning opportunity. Shipping frequency becomes the heartbeat of feedback.

OODA Acceleration

Feedback Loop Compression is specifically about accelerating the OODA Loop:

Phase	Before AI	After AI
Observe	Ops tools, dashboards, manual inspection	Automated telemetry streaming to dev context
Orient	Domain expertise, manual triage	AI interprets traces, suggests root causes
Decide	Developer reasoning about fix	AI proposes solutions with verification plans
Act	Manual code changes	AI-generated patches, validated before merge

The key insight: AI doesn’t just accelerate Act—it accelerates the entire cycle. An agent can observe production logs, orient against the codebase, decide on a fix, and act to implement it, all within a single interaction loop.

Implications for L3 Autonomy

At L3 (Conditional Autonomy), humans remain in the loop for judgment calls. Feedback Loop Compression doesn’t eliminate this—it makes each human decision more informed:

Faster observation → Humans see production state sooner
Better orientation → AI surfaces relevant context
Clearer decisions → Proposals come with validation evidence
Verified actions → Human approves after seeing proof-of-correctness

The compressed loop doesn’t bypass human oversight; it gives humans better information faster.

The “Nobody Understands It” Risk

“What happens when nobody wrote the code you just deployed, and nobody really understands it?” — Charity Majors

This is the dark side of compressed feedback loops. AI-generated code deployed at AI speed can outpace human understanding. ASDLC addresses this through:

Specs — Persist intent for future agents (and humans)
Living Specs — Crystallize learnings as they emerge
Context Gates — Force understanding checkpoints before deployment
Constitutional Review — Validate code against values, not just correctness

Compressed loops without crystallized understanding lead to accumulated technical debt at AI speed.

ASDLC Usage

Compression Enabler	ASDLC Response
Code generation → free	Focus shifts to Spec-Driven Development
Observation → automated	Ralph Loop reads logs, test output automatically
Orientation → AI-assisted	Context Engineering structures what AI sees
Validation → continuous	Context Gates enforce verification

Applied in:

Learning Loop — The crystallize phase preserves compressed learnings
Production Readiness Gap — Observability as production requirement
OODA Loop — The cognitive model being compressed

Anti-Patterns

Anti-Pattern	Description
Shipping Blind	Compressing the Act phase without compressing Observe—deploying code without telemetry
Speed Over Understanding	Deploying faster than the team can comprehend; accumulated mystery code
Observation Without Orientation	Collecting telemetry without structuring it for AI comprehension
Lossy Loops	Fast cycles that don’t preserve learnings; next session rediscovers same constraints

Gherkin

A structured, domain-specific language using Given-When-Then syntax to define behavioral specifications that are both human-readable and machine-actionable.

Status: Live | Last Updated: 2026-01-13

Definition

Gherkin is a structured, domain-specific language using Given-When-Then syntax to define behavioral specifications in plain text. While Behavior-Driven Development provides the methodology, Gherkin provides the concrete syntax.

Gherkin’s effectiveness for LLM agents stems from its properties: human-readable without technical jargon, machine-parseable with predictable structure, and aligned between technical and non-technical stakeholders. Each keyword defines a phase of reasoning that prevents agents from conflating setup, action, and verification into an undifferentiated blob.

The Given-When-Then Structure

Gherkin scenarios follow a consistent three-part structure:

Feature: User Authentication
  As a registered user
  I want to log into the system
  So that I can access my personalized dashboard

  Scenario: Successful login with valid credentials
    Given a registered user with email "user@example.com"
    And the user has password "SecurePass123"
    When the user submits login credentials
    Then the user should be redirected to the dashboard
    And a session token should be created

Keyword Semantics

Keyword	Traditional BDD	Agentic Translation
Given	Preconditions or initial state	Context setting, memory retrieval, environment setup
When	The trigger event or user action	Task execution, tool invocation, decision step
Then	The observable outcome	Verification criteria, alignment check, evidence-of-done
And/But	Additional conditions within a step	Logical constraints, secondary validation parameters
Feature	High-level description of functionality	Functional blueprint, overall agentic goal
Background	Steps common to all scenarios	Pre-test fixtures, global environment variables

ASDLC Usage

Gherkin isn’t just a testing syntax—it’s a semantic constraint language for agent behavior.

When an agent reads a Gherkin scenario:

Given tells it what to assume (context setup)
When tells it what action to take (execution scope)
Then tells it what success looks like (verification criteria)

This partitioning prevents “context bleed” where agents conflate setup, action, and verification.

In Specs: The Spec Contract section uses Gherkin scenarios:

## Contract

### Scenarios

#### Happy Path
Given a valid API key
When the user requests /api/notifications
Then the response returns within 100ms
And the payload contains the user's notifications

Applied in:

The Spec — Contract section uses Gherkin scenarios
Context Gates — Gherkin scenarios define gate verification criteria
Living Specs — Gherkin scenarios evolve with the feature

Guardrails

Why we deprecated the term 'Guardrails' in favor of strict separation between deterministic Context Gates and probabilistic Agent Constitutions.

Status: Deprecated | Last Updated: 2026-01-01

⚠️ Deprecated: This concept has been superseded by Context Gates and Agent Constitution.

The Ambiguity Problem

In the broader AI industry, “Guardrails” has become a “suitcase word”—a single term packed with too many conflicting meanings. It conflates architectural firewalls (hard rules) with prompt engineering (soft influence).

This ambiguity leads to fragile systems where engineers try to fix logic errors with prompt tuning (which is unreliable) or restrict creativity with rigid code blocks (which is stifling).

Standard Definitions

Broadly, industry implementations of “Guardrails” typically fall into two buckets:

Input/Output Filtering: Deterministic systems that intercept and block messages based on policy (e.g., NVIDIA NeMo).
Behavioral Constraint: Probabilistic techniques (prompting/tuning) to prevent the model from deviating from its persona.

The ASDLC Interpretation

To resolve this ambiguity, we have deprecated “Guardrails” in favor of strictly separating the concept into two distinct mechanisms: The Brakes and The Driver.

1. Context Gates (The Brakes)

These are deterministic validation layers. Just as car brakes function regardless of what the driver “thinks,” Gates trigger regardless of the LLM’s intent.

Supersedes: Input/Output filtering, Schema validation.
See: Context Gates

2. Agent Constitution (The Driver)

These are probabilistic steering instructions. They are the training and rules the “driver” (LLM) carries in its head to make good decisions.

Supersedes: Prompt injection defense, Tone enforcement.
See: Agent Constitution

Comparison of Controls

Feature	Context Gates	Agent Constitution
Nature	Deterministic (Binary)	Probabilistic (Semantic)
Location	External (Firewall/Code)	Internal (Context Window)
Goal	Correctness (Prevent errors)	Alignment (Steer intent)
Failure Mode	Exception / Rejection	Hallucination / Bad Style
Analogy	The Brakes	The Driver’s Training

Superseding Concepts

This concept has been superseded by:

Context Gates — Deterministic validation layers.
Agent Constitution — Probabilistic steering instructions.

See also:

AGENTS.md Specification — Implementation guide.

Levels of Autonomy

SAE-inspired taxonomy for AI agent autonomy in software development, from L1 (assistive) to L5 (full), standardized at L3 conditional autonomy.

Status: Live | Last Updated: 2026-01-09

Definition

The Levels of Autonomy scale categorizes AI systems based on their operational independence in software development contexts. Inspired by the SAE J3016 automotive standard, it provides a shared vocabulary for discussing human oversight requirements.

The scale identifies where the Context Gate (the boundary of human oversight) must be placed for each level. Under this taxonomy, autonomy is not a measure of intelligence—it is a measure of operational risk and required human involvement.

The Scale

Level	Designation	Description	Human Role	Failure Mode
L1	Assistive	Autocomplete, Chatbots. Zero state retention.	Driver. Hands on wheel 100% of time.	Distraction / Minor Syntax Errors
L2	Task-Based	”Fix this function.” Single-file context.	Reviewer. Checks output before commit.	Logic bugs within a single file.
L3	Conditional	”Implement this feature.” Multi-file orchestration.	Change Owner. Validates CI/CD, footprint, & intervenes on drift.	Regression to the Mean (Mediocrity).
L4	High	”Manage this backlog.” Self-directed planning.	Auditor. Post-hoc analysis.	Silent Failure. Strategic drift over time.
L5	Full	”Run this company.”	Consumer. Passive beneficiary.	Existential alignment drift.

Analogy: The Self-Driving Standard (SAE)

The software autonomy scale maps directly to SAE J3016, the automotive standard for autonomous vehicles. This clarifies “Human-in-the-Loop” requirements using familiar terminology.

ASDLC Level	SAE Equivalent	The “Steering Wheel” Metaphor
L1	L1 (Driver Assist)	Hands On, Feet On. AI nudges the wheel (Lane Keep) or gas (Cruise), but Human drives.
L2	L2 (Partial)	Hands On (mostly). AI handles steering and speed in bursts, but Human monitors constantly.
L3	L3 (Conditional)	Hands Off, Eyes On. AI executes the maneuver (The Drive). Human is the Owner ready to intervene if it leaves the paved path.
L4	L4 (High)	Mind Off. Sleeping in the back seat within a geo-fenced area. Dangerous if the “fence” (Context) breaks.
L5	L5 (Full)	No Steering Wheel. The vehicle has no manual controls.

ASDLC Usage

ASDLC standardizes practices for Level 3 (Conditional Autonomy) in software engineering. While the industry frequently promotes L5 as the ultimate goal, this perspective is often counterproductive given current tooling maturity. L3 is established as the sensible default.

[!WARNING] Level 4 Autonomy Risks

At L4, agents operate for days without human intervention but lack the strategic foresight needed to maintain system integrity. This results in Silent Drift—the codebase continues to function technically but gradually deteriorates into an unmanageable state.

While advanced verification environments like the AI Software Factory offer technical mitigations against drift, eliminating human code review introduces severe, unpriced Governance Threats (including Liability and Disclosure gaps) that make L4 operations high-risk for enterprise compliance.

[!NOTE] Empirical Support for L3

Anthropic’s 2025 internal study of 132 engineers validates L3 as the practical ceiling:

Engineers fully delegate only 0-20% of work

Average 4.1 human turns per Claude Code session

High-level design and “taste” decisions remain exclusively human-owned

The “paradox of supervision”—effective oversight requires skills that AI use may atrophy

Applied in:

Context Gates — The mechanism enabling safe L3 operation
Guardrails — Safety constraints for agent behavior
Agentic SDLC — The broader methodology context

Mermaid

A text-based diagramming language that renders flowcharts, sequences, and architectures from markdown, enabling version-controlled visual specifications.

Status: Live | Last Updated: 2026-01-13

Definition

Mermaid is a text-based diagramming language that renders flowcharts, sequence diagrams, and architecture visualizations from markdown-style code blocks. In agentic development, Mermaid serves as the specification language for processes, workflows, and system relationships.

Where Gherkin specifies behavior and YAML specifies structure, Mermaid specifies process—how components interact, how data flows, and how state transitions occur.

Key Characteristics

Text-Based Diagrams

Mermaid diagrams are defined in plain text, making them:

Version-controllable — Diagram changes appear in diffs
Reviewable — Same PR process as code
Agent-parseable — LLMs can read and modify diagrams

flowchart LR
    A[Input] --> B[Process]
    B --> C[Output]

Diagram Types

Type	Use Case	ASDLC Application
Flowchart	Process flows, decision trees	Feature Assembly, Context Gates
Sequence	API interactions, message flows	Service contracts, Integration specs
State	State machines, lifecycle	Component state, Workflow phases
Class	Object relationships	Domain models, Architecture
ER	Entity relationships	Data models, Schema design
Gantt	Timeline, scheduling	Roadmaps, Sprint planning

Subgraphs for Grouping

Subgraphs partition complex diagrams into logical regions:

flowchart LR
    subgraph Input
        A[Source]
    end
    
    subgraph Processing
        B[Transform]
        C[Validate]
        B --> C
    end
    
    A --> B
    C --> D[Output]

ASDLC Usage

Mermaid serves as the process specification language in ASDLC, completing the specification triad:

Language	Specifies	Example
Gherkin	Behavior	Given/When/Then scenarios
YAML	Structure	Schemas, configuration
Mermaid	Process	Flowcharts, sequences

Why Mermaid for Specs:

Text-based diagrams solve a critical problem in agentic development: visual documentation that agents can read, modify, and version-control. Unlike image-based diagrams that become stale context, Mermaid diagrams are:

Agent-modifiable — LLMs can update flows as requirements change
Diffable — Changes appear in code review alongside logic changes
Living — Part of the spec, not a separate artifact that drifts

Relationship to Patterns:

The Spec — Specs embed Mermaid to visualize feature architecture and state flows
Context Engineering — Diagrams as structured, machine-readable context

Anti-Patterns

Anti-Pattern	Description
Box Soup	Too many nodes without grouping
Arrow Spaghetti	Excessive cross-connections
No Labels	Edges without descriptive text
Static Screenshots	Images instead of text diagrams

[!TIP] Key practices: Group with subgraphs, label edges, use flowchart LR for process flows, limit to <15 nodes per diagram.

Model Context Protocol (MCP)

The universal connector for AI agents to access tools and data, acting as the supply chain infrastructure for agentic workflows.

Status: Draft | Last Updated: 2026-03-02

Model Context Protocol

Definition

The Model Context Protocol (MCP) is an open standard that functions as a universal connector between AI assistants and external systems. Operating on a Client-Host-Server architecture via JSON-RPC 2.0, it standardizes how AI models interact with data repositories, APIs, and business tools.

If agents are the cognitive engines of the AI era, MCP is the supply chain infrastructure. It replaces fragmented, custom integrations with a single, unified protocol, allowing developers to build a connector once and have it work seamlessly across different AI platforms (like Claude, Cursor, or specialized agentic workflows).

Key Characteristics

Client-Host-Server Architecture: The AI application (Host) connects to data sources (Servers) through the Protocol interface.
Just-in-Time Access: MCP allows the model to query live SQL databases or read the current state of repositories at the exact moment of inference, removing reliance on stale data dumps.
Task-Based Execution: The November 2025 specification introduced async support for long-running tasks and agentic server-side sampling.
Tool and Context Delivery: MCP provides discovery and invocation mechanisms for both passive knowledge (Resources) and active capabilities (Tools).

Dynamic Retrieval vs. Static Context

A critical tension in agentic development is when to rely on dynamic retrieval (via MCP tools) versus static file-based context (like AGENTS.md). While MCP is powerful, it is not always the optimal choice for communicating framework knowledge or static instructions.

The Vercel Evaluation

Vercel’s 2025 agent evaluations revealed a counterintuitive finding: a compressed static docs index in AGENTS.md outperformed sophisticated MCP skill-based retrieval. In tests measuring Next.js 16 API correctness:

Skills (Active Retrieval): Reached a 79% pass rate, suffering from unreliable triggering and wording fragility. The agent had to actively decide when to invoke the skill, creating a sequencing bottleneck.
AGENTS.md (Static Context): Achieved a 100% pass rate. The information was consistently available in the system prompt on every turn, eliminating the decision point entirely.

The ASDLC Heuristic

This dynamic maps directly to Context Engineering. The decision heuristic for providing context to agents is:

Use Static Context (AGENTS.md): For framework knowledge, coding standards, repository structure, and bounded domain rules. Static files require zero agent reasoning to discover and are consistently available.
Use Dynamic Context (MCP): For highly mutable external state (live databases, Slack messages, third-party API queries) or actions that change the state of the world (executing commands, sending emails).

As Gloaguen et al. demonstrated, LLM reliability often negatively correlates with instructional context volume. The goal is constraint minimalism: give the agent exactly what it needs, in the simplest format possible.

Security Surface and Risks

As MCP adoption scales, its security footprint has become a primary organizational concern. The protocol’s initial focus on ease-of-use introduced several critical threat vectors:

Tool Poisoning and RCE: A malicious actor can inject prompt injections or malicious commands into external data sources (like a database or a shared document). When a trusted MCP server reads that data, the host agent executes it, potentially leading to Remote Code Execution (RCE) via tool hallucination.
Cross-Server Interference: When multiple MCP servers are active in the same context, agents can experience tool confusion or execute tools from the wrong server.
Schema Drift and Vendoring: The descriptions, names, and schemas of MCP tools are fed directly into the agent’s prompt. Taking breaking changes to an MCP server can silently degrade agent performance.

Vendoring as Mitigation

To address schema drift and token cost (e.g., loading GitHub’s entire MCP toolset consumes massive token budgets), the ASDLC advocates for vendoring tool definitions. Tools like mcp-to-ai-sdk allow teams to generate static copies of MCP tools, pinning their behavior and schemas to ensure determinism over vibes.

What MCP Doesn’t Solve

A common misconception is that MCP is a workflow engine. It is not. MCP solves how context is delivered to a model; it does not solve how the model is governed.

Relying solely on MCP leads to architectural gaps:

Orchestration: MCP provides tools, but does not sequence when they run, how failures are handled, or when human-in-the-loop escalation is required.
Governance: MCP lacks native mechanisms enforcing policy compliance across tool calls.
Lifecycle Management: Connecting an agent to 50 tools via MCP creates a massive, unstructured action space that degrades reasoning.

The ASDLC Missing Pieces

To build production-ready systems, MCP must be paired with structural patterns:

Context Gates: Architectural checkpoints that filter MCP outputs before they enter the reasoning loop.
Workflow as Code: Deterministic pipelines that wrap MCP tool calls in type-safe execution graphs, ensuring reliability.

MCP is an operational requirement for the Agentic SDLC, but it is just the plumbing. It provides the water; the framework must build the pipes.

Model-Driven Development

An early 2000s software engineering paradigm that attempted 100% code generation from models, serving as a cautionary tale for modern spec-as-source AI hype.

Status: Live | Last Updated: 2026-03-03

Definition

Model-Driven Development (MDD)—often associated with Model-Driven Architecture (MDA)—was a software engineering movement prominent in the early 2000s. Its core ambition was to elevate the level of abstraction in software engineering.

In MDD, developers authored high-level visual models (such as UML diagrams) or textual Domain-Specific Languages (DSLs) instead of writing general-purpose code. Complex code generation tools were then tasked with translating these models into 100% of the underlying implementation code. The goal was to separate the functional business logic (the model) from the technical implementation details.

Why MDD Failed

Despite enormous industry hype, MDD failed to achieve mainstream enterprise adoption for several structural reasons:

1. The Abstraction Trap

The promise of MDD was that models would be simpler than code. In practice, writing a model that was precise enough to generate edge-case-handling production code required the model itself to become just as complex as the code it was replacing. Instead of reducing complexity, MDD merely shifted it from a well-understood programming language into a proprietary, clunky modeling language.

2. Big Upfront Design (BUFD)

MDD entrenched a rigid, waterfall-style methodology. It required massive upfront investment in creating complete models before any executable software emerged. This fundamental inflexibility clashed directly with the rise of Agile methodologies, which prioritized rapid iteration, immediate feedback, and working software over comprehensive documentation.

3. Tooling Inadequacy

MDD relied entirely on code generation tools. These tools were often expensive, proprietary, and poorly integrated into developer workflows. Crucially, they lacked essential developer experience (DX) features. When a generated application had a bug, deciphering whether the flaw was in the model, the code generator, or the execution environment was nearly impossible because debugging at the “model level” was severely limited.

The LLM Renaissance (and Risk)

The rise of Generative AI and Large Language Models (LLMs) has sparked a renewed interest in the core premise of MDD, often rebranded under terms like spec-as-source.

Because LLMs can parse natural language, they remove the need for rigid DSLs and complex, proprietary parsers. Developers can now write a natural language specification and ask an AI agent to generate the code.

However, trading MDD for LLMs introduces a dangerous new variable: non-determinism.

While MDD failed because its determinism was too rigid, LLMs struggle because they are inherently probabilistic. If a human only edits a natural language spec and expects the LLM to cleanly regenerate the entire codebase 1:1 every time without drifting or introducing novel bugs, they are treating the LLM like a magical compiler.

The ASDLC Stance

ASDLC views the history of MDD as a critical cautionary tale. The desire to never look at implementation code again is an anti-pattern.

We learn from MDD’s failures by adopting a spec-anchored philosophy:

Specs are for Intent: We write Living Specs to define architectural boundaries, invariants, and accepted behavior.
Code is for Logic: We retain human oversight over the deterministic implementation code.

Code is not an implementation detail to be abstracted away; it is the only medium capable of expressing logic deterministically. ASDLC uses agents to write code quickly guided by specs, not to hide the code entirely.

Read Next: Learn how ASDLC navigates these pitfalls in Spec-Driven Development.

OODA Loop

The Observe-Orient-Decide-Act decision cycle—a strategic model from military combat adapted for autonomous agent behavior in software development.

Status: Live | Last Updated: 2026-01-13

Definition

The OODA Loop—Observe, Orient, Decide, Act—is a strategic decision-making cycle originally developed by U.S. Air Force Colonel John Boyd for aerial combat. Boyd’s insight: the combatant who cycles through these phases faster than their opponent gains decisive advantage. The key isn’t raw speed—it’s tempo relative to environmental change.

Boyd’s less-quoted but crucial insight: Orient is everything. The Orient phase is where mental models, context, and prior experience shape how observations become decisions. A faster but poorly-oriented loop loses to a slower but well-oriented one.

In agentic software development, OODA provides the cognitive model for how autonomous agents should behave: continuously cycling through observation, interpretation, planning, and execution.

The Four Phases

Observe — Gather information about the current state of the environment
Orient — Interpret observations through mental models, context, and constraints
Decide — Formulate a specific plan for action based on orientation
Act — Execute the plan, producing changes that feed new observations

The loop is continuous. Each Act produces new state, triggering new Observe, and the cycle repeats.

Key Characteristics

Tempo, Not Raw Speed

The strategic value of OODA isn’t speed—it’s cycling faster than the environment changes. In software development, the “environment” is the codebase, requirements, and constraints. An agent that can cycle through OODA before context rot sets in converges on correct solutions.

Orient as the Critical Phase

For AI agents, Orient is the context window. The quality of orientation depends on:

Spec Clarity — Garbage spec → garbage orientation
Constitution Directives — Values that shape interpretation
Context Gates — Filtering noise so orientation isn’t polluted
Prior State — Git history, progress files, previous learnings

This is why Context Engineering isn’t optional overhead. It’s engineering the Orient phase, which determines whether fast cycling produces progress or noise.

OODA vs. Single-Shot Interactions

Standard LLM interactions are Observe-Act: user provides input, model produces output. No explicit Orient or Decide phase. The model’s “orientation” is implicit in training and whatever context happens to be present.

Agentic workflows make OODA explicit:

Phase	Single-Shot LLM	Agentic Workflow
Observe	User prompt	Instrumented: read files, run tests, check logs
Orient	Implicit (training + context)	Engineered: Specs, Constitution, Context Gates
Decide	Implicit	Explicit: agent states plan before acting
Act	Generate response	Verified: external tools confirm success/failure

This explicit structure enables debugging. When an agent fails, you can diagnose which phase broke down:

Bad Observe? Agent missed relevant information
Bad Orient? Context was polluted or incomplete
Bad Decide? Plan was incoherent given good orientation
Bad Act? Execution failed despite good plan

ASDLC Usage

In ASDLC, OODA explains why cyclic workflows outperform linear pipelines:

OODA Phase	Agent Behavior	ASDLC Component
Observe	Read codebase state, error logs, test results	File state, test output
Orient	Interpret against context and constraints	Context Gates, AGENTS.md
Decide	Formulate implementation plan	PBI decomposition
Act	Write code, run tests, commit	Micro-commits

The Learning Loop is OODA with an explicit “Crystallize” step that improves future Orient phases. Where OODA cycles continuously, Learning Loop captures discoveries into machine-readable context for subsequent agent sessions.

Applied in:

Context Engineering — The discipline of engineering the Orient phase
Context Gates — Checkpoints between OODA phases
Levels of Autonomy — Higher autonomy requires more sophisticated Orient capabilities

Anti-Patterns

Anti-Pattern	Description	Failure Mode
Observe-Act	Skipping Orient/Decide. Classic vibe coding.	Works for simple tasks; fails at scale; no learning
Orient Paralysis	Over-engineering context, never acting	Analysis paralysis; no forward progress
Stale Orient	Not updating mental model when observations change	Context rot; agent operates on outdated assumptions
Observe Blindness	Not instrumenting observation of relevant state	Agent misses critical information (failed tests, error logs)
Act Without Verify	Not confirming action results before next cycle	Cascading errors; false confidence

Product Requirement Prompt (PRP)

A structured methodology combining PRD, codebase context, and agent runbook—the minimum spec for production-ready AI code.

Status: Experimental | Last Updated: 2025-01-05

Definition

A Product Requirement Prompt (PRP) is a structured methodology that answers the question: “What’s the minimum viable specification an AI coding agent needs to plausibly ship production-ready code in one pass?”

As creator Rasmus Widing defines it: “A PRP is PRD + curated codebase intelligence + agent runbook.”

Unlike traditional PRDs (which exclude implementation details) or simple prompts (which lack structure), PRPs occupy the middle ground—a complete context packet that gives an agent everything it needs to execute autonomously within bounded scope.

The methodology emerged from practical engineering work in 2024 and has since become the foundation for agentic engineering training.

Key Characteristics

PRPs are built on three core principles:

Plan before you prompt — Structure thinking before invoking AI
Context is everything — Comprehensive documentation enables quality output
Scope to what the model can reliably do in one pass — Bounded execution units

A complete PRP includes six components:

Component	Purpose
Goal	What needs building
Why	Business value and impact justification
Success Criteria	States that indicate completion (not activities)
Health Metrics	Non-regression constraints (what must not degrade)
Strategic Context	Trade-offs & priorities (from Product Vision)
All Needed Context	Documentation references, file paths, code snippets
Implementation Blueprint	Task breakdown and pseudocode
Validation Loop	Multi-level testing (syntax, unit, integration)

Key Differentiators from Traditional PRDs

Precise context: Specific file paths, library versions, code examples
Documentation integration: Links to relevant library docs and architectural patterns
Known gotchas: Critical warnings about potential pitfalls
Validation frameworks: Executable tests the AI can run and fix iteratively

ASDLC Usage

PRP components map directly to ASDLC concepts—a case of convergent evolution in agentic development practices.

PRP Component	ASDLC Equivalent
Goal	The Spec — Blueprint
Why	Product Thinking
Success Criteria	Context Gates
Health Metrics	The Spec — Non-Functional Reqs / Constraints
Strategic Context	Product Vision — Runtime Injection
All Needed Context	Context Engineering
Implementation Blueprint	The PBI
Validation Loop	Context Gates — Quality Gates

In ASDLC terms, a PRP is equivalent to The Spec + The PBI + curated Context Engineering—bundled into a single artifact optimized for agent consumption.

ASDLC separates these concerns for reuse: multiple PBIs reference the same Spec, and context is curated per-task rather than duplicated. For simpler projects or rapid prototyping, the PRP’s unified format may be more practical. The methodologies are complementary—PRPs can be thought of as “collapsed ASDLC artifacts” for single-pass execution.

Applied in:

Spec-Driven Development — The philosophy PRPs implement
The Spec — ASDLC’s permanent specification pattern
The PBI — ASDLC’s transient execution unit

See also:

Industry Alignment — Convergent frameworks in agentic development
Spec-Driven Development — ASDLC’s foundational methodology
The Spec — ASDLC’s specification pattern
Vibe Coding — The anti-pattern both PRP and SDD address

Product Thinking

The practice of engineers thinking about user outcomes, business context, and the 'why' before the 'how'—the core human skill in the AI era.

Status: Experimental | Last Updated: 2025-01-05

Definition

Product Thinking is the practice of engineers understanding and prioritizing user outcomes, business context, and the reasoning behind technical work (“why”) before focusing on implementation details (“how”).

Rather than waiting for fully-specified requirements and executing tasks mechanically, product-thinking engineers actively engage with the problem space. They ask:

What user problem does this solve?
Which tradeoffs are acceptable for this context?
How will this decision impact long-term maintainability?
Is this the right problem to solve at all?

This mindset originated in product management but has become essential for modern engineering teams, especially as AI increasingly handles implementation while humans must provide strategic judgment.

Key Characteristics

Outcome Orientation Product-thinking engineers measure success by user and business outcomes, not just task completion. They question whether closing a ticket actually moved the product forward.

Context Awareness They understand the broader system: user workflows, business constraints, competitive landscape, and technical debt landscape. Code decisions are made with this context, not in isolation.

Tradeoff Evaluation Every technical decision involves tradeoffs (speed vs maintainability, generality vs simplicity, build vs buy). Product-thinking engineers explicitly identify and evaluate these tradeoffs rather than defaulting to “best practice.”

Ownership Mindset They take responsibility for outcomes, not just implementations. If a feature ships but users don’t adopt it, a product-thinking engineer investigates why, even if the code “worked as specified.”

Risk Recognition They can look at technically correct code and identify product risks: “This will confuse users,” “This locks us into a vendor,” “This creates a support burden.” These risks are invisible to AI.

The AI Era Shift

Matt Watson (5x Founder/CTO, author of Product Driven) argues that vibe coders outperform average engineers not because of superior coding skill, but because they think about the product:

“A lot of engineers? They’re just waiting for requirements. That’s usually a leadership problem. For years, we rewarded engineers for staying in their lane, closing tickets, and not rocking the boat. Then we act surprised when they don’t think like owners.”

The traditional model:

Product Manager writes requirements
Engineer implements requirements
Success = code matches spec

Why this fails in the AI era:

AI can already handle “just build this” work faster than humans
The bottleneck shifts from implementation to deciding what to build
Engineers who only execute become redundant; those who evaluate and steer remain essential

The new competitive advantage:

AI writes code; humans decide what matters
AI generates implementations; humans evaluate which tradeoffs are dangerous
AI follows instructions; humans recognize when “the clean implementation is still the wrong product”

Watson’s conclusion: “Product thinking isn’t a bonus skill anymore. In an AI world, it’s the job.”

The Leadership Problem

Product thinking doesn’t emerge by accident. Watson identifies the structural cause:

Anti-patterns that kill product thinking:

Engineers rewarded for “staying in their lane” instead of challenging requirements
Context withheld (“you don’t need to know the business reason, just build it”)
Decisions flowing top-down through a single bottleneck (PM or architect)
Success measured by velocity (story points closed) rather than outcomes (user problems solved)

What builds product thinking:

Clearly explain what needs to be done and why
Give context instead of just tasks
Trust engineers to figure out the how
Train them to own outcomes, not just implementations

If every technical decision must flow through a product manager or architect, the organization has created a dependency on human bottlenecks that AI cannot solve.

Applications

Pre-AI Era: Product thinking was a differentiator for senior engineers and those in “full-stack” or startup environments. Most engineers could succeed by executing well-defined requirements.

AI Era: Product thinking becomes the baseline. As AI handles implementation, the human contribution shifts entirely to:

Defining the problem worth solving
Evaluating whether AI-generated solutions actually solve it
Recognizing risks and tradeoffs the model cannot see

Where product thinking is essential:

Greenfield products: No established patterns; every decision sets precedent
Strategic refactoring: Deciding which technical debt to address and why
API design: Tradeoffs between developer experience, performance, and flexibility
Early-stage startups: Speed-to-market vs maintainability requires constant judgment calls
AI-assisted development: Evaluating whether vibe-coded solutions are “good enough” or hiding risks

ASDLC Usage

In ASDLC, product thinking is why Specs exist. The Spec is not bureaucratic overhead—it’s the forcing function that makes product thinking explicit and sharable.

The connection:

Product Thinking = The human capability (understanding “why”)
The Spec = The artifact that captures product thinking (machine-readable “why”)
Spec-Driven Development = The workflow that ensures product thinking happens before code generation

When an engineer writes a Spec, they’re forced to answer:

What user problem does this solve?
What are the acceptance criteria?
Which edge cases matter and which don’t?
What are the non-functional requirements (performance, security, observability)?

If they can’t answer these questions, they don’t understand the product problem yet. Vibe coding without this foundation produces code that works but solves the wrong problem.

The ASDLC position:

AI agents execute maneuvers (implementation)
Human engineers provide strategic judgment (product thinking)
Specs encode that judgment in machine-readable form
Context Gates enforce that specs were actually written

This is the “Instructor-in-the-Cockpit” model: the pilot (AI) flies the plane, but the instructor (human) decides where to fly and evaluates whether the flight is safe.

Applied in:

Spec-Driven Development — Product thinking as prerequisite to code generation
The Spec — The artifact that captures product thinking
Vibe Coding — The failure mode when product thinking is skipped

Best Practices

For Individual Engineers:

Before writing code, write the “why” in plain English
Question requirements that don’t explain user impact
Propose alternatives when you see tradeoff mismatches
Treat AI-generated code skeptically: Does it solve the right problem?

For Engineering Leaders:

Share business context, even when it feels like “too much detail”
Reward engineers who challenge bad requirements, not just those who ship fast
Make “why” documentation non-optional (use Specs or equivalent)
Measure outcomes (user adoption, retention, error rates) not just velocity (story points)

For Organizations:

Flatten decision-making: trust engineers to own tradeoffs in their domain
Train product thinking explicitly (it’s not intuitive for engineers trained to “just code”)
Create feedback loops: engineers see how their code impacts users
Recognize that AI scales implementation, not judgment—invest in the latter

Anti-Patterns

“Just Build It” Culture: Engineers discouraged from asking “why” or proposing alternatives. Leads to technically correct code that solves the wrong problem.

Context Hoarding: Product managers or architects hold all context and dole out tasks. Creates dependency bottleneck and prevents engineers from exercising judgment.

Velocity Worship: Success measured by tickets closed, not problems solved. Optimizes for speed of wrong solutions.

“Stay In Your Lane” Enforcement: Engineers punished for thinking beyond their assigned component. Prevents system-level thinking required for good product decisions.

See also:

Industry Alignment — External voices on the product thinking shift
Spec-Driven Development — How ASDLC encodes product thinking
Adversarial Requirement Review — The verification pattern that operationalizes product thinking
Vibe Coding — What happens when product thinking is absent

Production Readiness Gap

The distance between a working generative AI demo and a secure, scalable production system.

Status: Experimental | Last Updated: 2026-01-26

Definition

The Production Readiness Gap is the distance between “demo works” and “runs securely in production at scale.” This gap represents the validation work required when transitioning Vibe Coded prototypes to production systems.

The gap encompasses:

Correctness: From “90% correct” (probabilistic generation) to “always correct” (authentication, data integrity)
Performance: From seconds (LLM latency) to milliseconds (business logic)
Cost: From acceptable demo spend to sustainable unit economics
Maintainability: From “I understand it” to “the team understands it in 2 years”
Compliance: From “works” to “auditable, secure, and legally defensible”

The Fundamental Asymmetry

Crossing the Production Readiness Gap requires capabilities that LLMs currently lack without structural support:

Demo Requirements	Production Requirements
Local correctness (this function works)	Global correctness (system behaves consistently)
Happy path	All edge cases, error states, failure modes
Works once	Works reliably under load, over time
Developer understands it	Team maintains it for years
Acceptable cost for testing	Sustainable unit economics at scale

“You can’t ship ‘90% correct’ to enterprise customers. You can’t have authentication that works ‘most of the time’ or data integrity that’s ‘pretty good.’” — Dan Cripe

The “Missing Incentive” Test

A useful heuristic for evaluating AI capability claims: Are domain experts doing it?

If autonomous agents could spin up production SaaS with small teams, experienced engineers would be doing it en masse. They’re not. The people claiming it’s possible are typically:

Building personal productivity tools (valid, but not enterprise SaaS)
Running demos that haven’t hit production
Not disclosing how much human intervention (L2/L3) is actually happening

Observability as a Production Requirement

The Production Readiness Gap isn’t just about security, performance, and maintainability—it’s about verifiability in production. If you can’t observe what your code is doing after deployment, you can’t validate that it works.

“The bottleneck shifts from, ‘How fast can I write code?’ to, ‘How fast can I understand what’s happening and make good decisions about it?’” — Charity Majors

AI has made code generation nearly free. The constraint has shifted to understanding and validating what that code does in production. This reframes production readiness:

Old Constraint	New Constraint
Writing code	Understanding code
Testing before deploy	Validating after deploy
Hope it works	Observe that it works

Without observability, you’re “shipping blind”—deploying code that nobody fully understands, with no feedback loop to validate success. See Feedback Loop Compression for how AI enables tighter observe → validate → learn cycles.

ASDLC Usage

Applied in:

Provenance

The chain of custody and intent behind software artifacts, distinguishing high-value engineered systems from 'slop'.

Status: Experimental | Last Updated: 2026-02-15

Definition

Provenance in the Agentic SDLC is the traceable chain of human intent and verification behind every artifact.

As AI reduces the cost of generating code to near-zero, the value of software shifts from the volume of lines produced to its accountability. Code that appears magically (“vibe coding”) without clear direction has low provenance and acts as a liability. Code that results from specific human intent, articulated in a Spec and verified by a Gate, has high provenance.

The Theory of Value

The “Code is Cheap” philosophy fundamentally alters how we value software engineering activities:

Code is Cheap: LLMs provide an effectively infinite supply of syntax.
Attention is Finite: Human bandwidth to verify and steer is the bottleneck.
Provenance is Value: We value what we can trust. Trust comes from knowing who steered the agent and how it was verified.

“When one gets that big pull request (PR) on an open source repository, irrespective of its quality, if it is handwritten by a human, there is an intrinsic value and empathy for the human time and effort that is likely ascribed to it… That is what makes that code ‘expensive’ and not cheap.” — Kailash Nadh

In an agentic system, we cannot rely on “effort” as a proxy for value. We must rely on provenance—the audit trail that proves a human intended for this code to exist and verified that it serves that intent.

The Spec as “Expensive Talk”

Linus Torvalds famously said, “Talk is cheap. Show me the code.”

In the AI era, this overrides. Code is cheap. Show me the talk.

“The Talk” is the Spec—the high-fidelity articulation of requirements, constraints, and architecture. Generating 10,000 lines of code is trivial; articulating exactly what those 10,000 lines should do is the hard, high-value work.

ASDLC Usage

Provenance is enforced through three mechanisms:

Intent Provenance (The Spec): Every change must trace back to a defined PBI or Spec. No “random acts of coding.”
Verification Provenance (Context Gates): Every state transition is gated by a verifiable check (e.g., “Verified by architect-agent using checklist-v1”).
Audit Provenance (Identity & Tracking): The granular chain of custody showing who did what.
- Micro-Commits: Granular, step-by-step reasoning rather than a single giant AI slop PR.
- Identity Separation: When orchestrating autonomous factories, models must operate under distinct, cryptographically isolated credentials (e.g., unique API tokens per agent persona). This ensures that every timeline comment is explicitly attributed to a specific model’s reasoning pathway, aiding in deterministic compliance tracking rather than blending multiple actors into a generic bot-admin account.

Applied in:

The Spec — The primary artifact of Intent Provenance.
Context Gates — The checkpoints of Verification Provenance.
Micro-Commits — The unit of Audit Provenance.

Request for Comments

A collaborative proposal document for significant changes that require team consensus before becoming formal decisions.

Status: Live | Last Updated: 2026-01-28

Definition

A Request for Comments (RFC) is a proposal document that solicits feedback on significant changes before they become formal decisions. Unlike an ADR which records a decision already made, an RFC opens a decision for collaborative input.

The term originates from the IETF (Internet Engineering Task Force), where RFCs have defined internet protocols since 1969. Modern software projects—Rust, React, Ember, Python—have adopted RFC processes for significant changes that affect many stakeholders.

Key Characteristics

Proposal-Oriented

RFCs propose; ADRs record. An RFC says “We should consider doing X” while an ADR says “We decided to do X.” The RFC process concludes with either acceptance (spawning ADRs) or rejection.

Collaborative

RFCs are designed for multi-stakeholder input. They include explicit comment periods and revision cycles. The goal is to surface concerns before committing to a direction.

Scope

RFCs typically cover changes that:

Affect multiple teams or systems
Require significant migration effort
Introduce breaking changes
Establish new architectural patterns

Single-component decisions usually don’t warrant an RFC—a direct ADR suffices.

Relationship to ADRs

Dimension	RFC	ADR
Purpose	Propose and gather feedback	Record a decision
Timing	Before decision	After decision
Mutability	Revised during comment period	Immutable once accepted
Output	One or more ADRs	Implementation guidance

An RFC may spawn multiple ADRs. For example, “RFC: Migrate from Firebase to Supabase” might result in:

ADR-010: Use Supabase Auth
ADR-011: Use Supabase Realtime for subscriptions
ADR-012: Migration strategy for existing users

ASDLC Usage

In ASDLC, RFCs are appropriate for:

Major architectural pivots (changing database providers, frontend frameworks)
Cross-cutting changes (new authentication model, API versioning strategy)
Migration plans (multi-phase transitions that affect multiple features)

For routine architectural decisions within a single feature domain, a direct ADR is sufficient.

Applied in:

The ADR — Pattern for recording decisions that result from RFCs
ADR Authoring — Includes RFC-to-ADR workflow

See also:

Context Engineering — RFCs as context for understanding project direction

Spec-Driven Development

Methodology that defines specifications before implementation, treating specs as living authorities that code must fulfill.

Status: Live | Last Updated: 2026-01-18

Definition

Spec-Driven Development (SDD) is an umbrella term for methodologies that define specifications before implementation. The core inversion: instead of code serving as the source of documentation, the spec becomes the authority that code must fulfill.

SDD emerged as a response to documentation decay in software projects. Traditional approaches treated specs as planning artifacts that diverged from reality post-implementation. Modern SDD treats specs as living documents co-located with code.

Contrast: For the anti-pattern SDD addresses, see Vibe Coding.

Key Characteristics

Living Documentation

Specs are not “fire and forget” planning artifacts. They reside in the repository alongside code and evolve with every change to the feature. This addresses the classic problem of documentation decay.

Kent Beck critiques SDD implementations that assume “you aren’t going to learn anything during implementation.” This is a valid concern—specs must evolve during implementation, not block it. The spec captures learnings so future sessions can act on them.

Determinism Over Vibes

Nick Tune argues that orchestration logic should be “mechanical based on simple rules” (code) rather than probabilistic (LLMs). Specs define the rigid boundaries; code enforces the workflow; LLMs handle only the implementation tasks where flexibility is required.

Visual Designs Are Not Specs

[!WARNING] The Figma Trap A beautiful mockup is not a specification; it is a suggestion. Mockups typically demonstrate the “happy path” but hide the edge cases, error states, and data consistency rules where production bugs live.

Never treat a visual design as a complete technical requirement.

Levels of SDD Adoption

Industry usage of the term SDD varies in maturity. The following levels describe how deeply a team relies on the specification:

spec-first: A specification is written upfront and used to generate the initial code. Afterward, the spec is abandoned, and developers return to editing code directly.
spec-anchored: The spec is maintained throughout the feature’s lifecycle inside the repository. It remains the source of truth for architectural intent and functional contracts (This is the ASDLC target).
spec-as-source: Only the spec is ever edited by humans. The codebase is 100% generated by LLMs acting as compilers.

Anti-Patterns

Spec-as-Source

While treating the specification as the only source code (spec-as-source) sounds appealing, ASDLC regards it as a dangerous anti-pattern.

It is a regression to the failed paradigms of Model-Driven Development (MDD). MDD failed because models became as complex as code, yet remained inflexible. Replacing strict MDD code generators with LLMs introduces non-determinism. If you generate an entire system from a natural language spec, tiny changes in the spec (or an update to the underlying LLM) can cause widespread, unpredictable changes in the generated logic.

To maintain control, we must remain spec-anchored. We use specs to define intent and boundaries, but we retain deterministic code as the ultimate truth for logical execution.

ASDLC Usage

ASDLC implements Spec-Driven Development through:

The Specs Pattern — The structural blueprint defining what a spec contains (Blueprint + Contract) and how it relates to PBIs
Living Specs Practice — How to create, maintain, and evolve specs alongside code
The Learning Loop — The iterative cycle that addresses Beck’s critique
Workflow as Code — Deterministic orchestration that enforces spec contracts programmatically

See also:

Vibe Coding — The anti-pattern SDD addresses
Context Engineering — Structuring specs for agent consumption

The 4D Framework (Anthropic)

A cognitive model codifying four essential competencies—Delegation, Description, Discernment, and Diligence—for effective generative AI use.

Status: Live | Last Updated: 2026-01-13

Definition

The 4D Framework is a cognitive model for human-AI collaboration developed by Anthropic in partnership with Dr. Joseph Feller and Rick Dakan as part of the AI Fluency curriculum.

The framework codifies four essential competencies for leveraging generative AI effectively and responsibly:

Delegation — The Strategy
Description — The Prompt
Discernment — The Review
Diligence — The Liability

Unlike process models (e.g., Agile or Double Diamond) that dictate workflow timing, the 4D Framework specifies how to interact with AI systems. It positions the human not merely as a “prompter,” but as an Editor-in-Chief, accountable for strategic direction and risk management.

The Four Dimensions

Delegation (The Strategy)

Before engaging with the tool, the human operator must determine what, if anything, should be assigned to the AI. This is a strategic decision between Automation (offloading repetitive tasks) and Augmentation (leveraging AI as a thought partner).

Core Question: “Is this task ‘boilerplate’ with well-defined rules (High Delegation), or does it demand nuanced judgment, deep context, or ethical considerations (Low Delegation)?”

Description (The Prompt)

AI output quality is directly proportional to input quality. “Description” transcends prompt engineering hacks by emphasizing Context Transfer—delivering explicit goals, constraints, and data structures required for the task.

Core Question: “Have I specified the constraints, interface definitions, and success criteria needed for this task?”

Discernment (The Review)

This marks the transition from Creator to Editor. The human must rigorously assess AI output for accuracy, hallucinations, bias, and overall quality. Failing to apply discernment is a leading cause of “AI Technical Debt.”

Core Question: “If I authored this output, would it meet code review standards? Does it introduce fictitious libraries or violate design tokens?”

Diligence (The Liability)

The human user retains full accountability for outcomes. Diligence acknowledges that while AI accelerates execution, it never removes user responsibility for security, copyright, or ethical compliance.

Core Question: “Am I exposing PII in the context window? Am I deploying unvetted code to production?”

Key Characteristics

The Editor-in-Chief Mental Model

The 4D Framework repositions the human from “prompt writer” to “editorial director.” Just as a newspaper editor doesn’t write every article but maintains accountability for what gets published, the AI-fluent professional maintains responsibility for all AI-generated outputs.

Continuous Cycle

These four dimensions are not sequential steps but concurrent concerns. Every AI interaction requires simultaneous attention to all four:

What should I delegate?
How clearly have I described it?
How critically am I reviewing the output?
What risks am I accepting?

Anti-Patterns

Anti-Pattern	Description
Over-Delegation	Assigning strategic decisions or ethically sensitive tasks to AI
Vague Description	Using natural language prompts without context, constraints, or examples
Blind Acceptance	Copy-pasting AI output without verification
Liability Denial	Assuming AI-generated content is inherently trustworthy or legally defensible

ASDLC Usage

Applied in: AGENTS.md Specification, Context Engineering, Context Gates

The 4D dimensions map to ASDLC constructs: Delegation → agent autonomy levels, Description → context engineering, Discernment → context gates, Diligence → guardrail protocols.

The Learning Loop

The iterative cycle between exploratory implementation and spec refinement, balancing vibe coding velocity with captured learnings.

Status: Live | Last Updated: 2026-01-26

Definition

The Learning Loop is the iterative cycle between exploratory implementation and constraint crystallization. It acknowledges that understanding emerges through building, while ensuring that understanding is captured for future agent sessions.

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” He’s right—discovery is essential. But pure vibe coding loses those discoveries. The next agent session starts from zero, re-discovering (or missing) the same constraints.

The Learning Loop preserves discoveries as machine-readable context, enabling compounding understanding across sessions.

The Cycle

Explore — Vibe code to discover edge cases, performance characteristics, or API behaviors
Learn — Identify constraints that weren’t obvious from requirements
Crystallize — Update the Spec with discovered constraints
Verify — Gate future implementations against the updated Spec
Repeat

Each iteration builds on the last. The spec grows smarter, and agents inherit the learnings of every previous session.

OODA Foundation

The Learning Loop is an application of the OODA Loop to software development:

Learning Loop Phase	OODA Equivalent
Explore	Observe + Act (gather information through building)
Learn	Orient (interpret what was discovered)
Crystallize	Decide (commit learnings to persistent format)
Verify	Observe (confirm crystallized constraints via gates)

The key insight: in software development, Orient and Observe are interleaved. You often can’t observe relevant constraints until you’ve built something that reveals them. The Learning Loop makes this explicit by treating Explore as a legitimate phase rather than a deviation from the plan.

Key Characteristics

Not Waterfall

The Learning Loop explicitly rejects the waterfall assumption that all constraints can be known upfront. Specs are scaffolding that evolve, not stone tablets.

Not Pure Vibe Coding

The Learning Loop also rejects the vibe coding assumption that documentation is optional. Undocumented learnings are lost learnings—the next agent (or human) will repeat the same mistakes.

Machine-Readable Capture

Learnings must be captured in formats agents can consume: schemas, constraints in YAML, acceptance criteria in markdown. Natural language is acceptable but structured data is preferred.

“The real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping.” — Unmesh Joshi

Automation: The Ralph Loop

The Learning Loop describes an iterative cycle that typically involves human judgment at each phase. The Ralph Loop automates this cycle for tasks with machine-verifiable completion criteria:

Learning Loop Phase	Ralph Loop Implementation
Explore	Agent implements based on PBI/Spec
Learn	Agent reads error logs, test failures, build output
Crystallize	Agent updates progress.txt; commits to Git
Verify	External tools (Jest, tsc, Docker) confirm success

When verification fails, Ralph automatically re-enters Explore with the learned context. The loop continues until external verification passes or iteration limit is reached.

Key difference: The Learning Loop expects human judgment in the Learn and Crystallize phases. The Ralph Loop requires that “learning” be expressible as observable state (error logs, test results) and “crystallization” be automatic (Git commits, progress files).

Ralph Loops work best when success criteria are machine-verifiable (tests pass, builds complete). For tasks requiring human judgment—ambiguous requirements, architectural decisions, product direction—the Learning Loop remains the appropriate model.

ASDLC Usage

In ASDLC, the Learning Loop connects several core concepts:

OODA Loop — The foundational cognitive model the Learning Loop implements
Vibe Coding is the Explore phase (valid for prototyping and discovery)
Living Specs capture the Crystallize phase
Context Gates enforce the Verify phase
Ralph Loop — Automated implementation for machine-verifiable tasks
PBIs trigger iteration through the loop

Applied in:

OODA Loop — The cognitive model foundation
Spec-Driven Development — Iterative refinement of specs
Living Specs — Maintenance of captured learnings
Context Gates — Verification checkpoints
Ralph Loop — Automated terminal implementation

Anti-Patterns

Anti-Pattern	Description
Waterfall Specs	Writing exhaustive specs before any implementation, assuming no learning will occur
Ephemeral Vibe Coding	Generating code without ever crystallizing learnings into specs
Spec-as-Paperwork	Updating specs for compliance rather than genuine constraint capture
Post-Hoc Documentation	Writing specs after implementation is complete, losing the iterative benefit

Vibe Coding

Natural language code generation without formal specs—powerful for prototyping, problematic for production systems.

Status: Experimental | Last Updated: 2025-01-05

Definition

Vibe Coding is the practice of generating code directly from natural language prompts without formal specifications, schemas, or contracts. Coined by Andrej Karpathy, the term describes an AI-assisted development mode where engineers describe desired functionality conversationally (“make this faster,” “add a login button”), and the LLM produces implementation code.

This approach represents a fundamental shift: instead of writing specifications that constrain implementation, developers describe intent and trust the model to infer the details. The result is rapid iteration—code appears almost as fast as you can articulate what you want.

While vibe coding accelerates prototyping and exploration, it inverts traditional software engineering rigor: the specification emerges after the code, if at all.

The Seduction of Speed

The productivity gains from vibe coding are undeniable:

At Anthropic: 80-90% of Claude Code’s codebase is now written by Claude Code itself, with a 70% productivity increase per engineer since adoption.
At Google: Approximately 30% of code committed in 2024 was AI-generated.
Industry-wide: Engineers report 2-10x faster feature delivery for greenfield projects and prototypes.

This velocity is seductive. When a feature that previously took three days can be scaffolded in thirty minutes, the economic pressure to adopt vibe coding becomes overwhelming.

The feedback loop is immediate: describe the behavior, see the code, run it, iterate. For throwaway scripts, MVPs, and rapid exploration, this workflow is transformative.

The Failure Modes

The velocity advantage of vibe coding collapses when code must be maintained, extended, or integrated into production systems:

Technical Debt Accumulation

Forrester Research predicts that by 2026, 75% of technology leaders will face moderate-to-severe technical debt directly attributable to AI-generated code. The mechanism is straightforward: code generated from vague prompts encodes vague assumptions.

When specifications exist only in the prompt history (or the engineer’s head), future maintainers inherit code without contracts. They must reverse-engineer intent from implementation—the exact problem formal specifications solve.

Copy-Paste Culture

2024 marked the first year in industry history where copy-pasted code exceeded refactored code. This is a direct symptom of vibe coding: when generating fresh code is faster than understanding existing code, engineers default to regeneration over refactoring.

Legacy Code in Record Time

As Codurance notes, speed without craftsmanship leads to “Legacy Code in record time.” When AI generates code faster than a human can understand it, the codebase immediately becomes “legacy”—code that developers are afraid to touch because they don’t understand its underlying intent or guarantees.

The result is systemic duplication. The same logic appears in fifteen places with fifteen slightly different implementations, none validated against a shared contract.

Silent Drift

LLMs are probabilistic. When generating code from vibes, they make assumptions:

Error handling strategies (fail silently? throw? log?)
Data validation rules (what’s a valid email?)
Concurrency models (locks? optimistic? eventual consistency?)

These assumptions are never documented. The code passes tests (if tests exist), but violates implicit architectural contracts. Over time, the system drifts toward inconsistency—different modules make different assumptions about the same concepts.

Boris Cherny (Principal Engineer, Anthropic; creator of Claude Code) warns: “You want maintainable code sometimes. You want to be very thoughtful about every line sometimes.”

“Speed is seductive. Maintainability is survival.”
— Boris Cherny, The Peterman Podcast (December 2025)

[!NOTE] The 100 Million Token Lesson

Dan Cripe, a 25-year enterprise software veteran, documented spending 100 million tokens on a frontier model attempting to fix its own architectural mistakes—not syntax errors, but fundamental design pattern violations. His diagnosis: “LLMs are pattern matchers, not architects. They generate code that looks like the code they were trained on: code written to solve an immediate problem, not code designed to be maintainable as part of a larger system.”

Vibe Coded Into a Corner

Anthropic’s internal research found that engineers who spend more time on Claude-assisted tasks often do so because they “vibe code themselves into a corner”—generating code without specs until debugging and cleanup overhead exceeds the initial velocity gains.

“When producing output is so easy and fast, it gets harder and harder to actually take the time to learn something.” — Anthropic engineer

This creates a debt spiral: vibe coding is fast until it isn’t, and by then the context needed to fix issues was never documented.

Regression to the Mean

Without deterministic constraints, LLMs trend toward generic solutions. Vibe coding produces code that works but lacks the specific optimizations, domain constraints, and architectural decisions that distinguish production systems from prototypes.

The model doesn’t know that “user IDs must never be logged” or “this cache must invalidate within 100ms.” These constraints exist in specifications, not prompts.

Applications

Vibe coding is particularly effective in specific contexts:

Rapid Prototyping: When validating product hypotheses, speed of iteration outweighs code quality. Vibe coding enables designers and product managers to generate functional prototypes without deep programming knowledge.

Throwaway Scripts: One-off data migrations, analysis scripts, and temporary tooling benefit from vibe coding’s velocity. Since the code has no maintenance burden, formal specifications are unnecessary overhead.

Learning and Exploration: When experimenting with new APIs, frameworks, or architectural patterns, vibe coding provides immediate feedback. The goal is understanding, not production-ready code.

Greenfield MVPs: Early-stage startups building minimum viable products often prioritize speed-to-market over maintainability. Vibe coding accelerates this phase, though technical debt must be managed during the transition to production.

ASDLC Usage

In ASDLC, vibe coding is recognized as a legitimate operational mode for bounded contexts (exploration, prototyping, throwaway code). However, for production systems, ASDLC mandates a transition to deterministic development.

The ASDLC position:

Vibe coding is steering (probabilistic guidance via prompts)
Production requires determinism (schemas, tests, typed interfaces)
Both are necessary: prompts steer the agent; schemas enforce correctness

Applied in:

Spec-Driven Development — The production-grade alternative to vibe coding
Context Gates — Deterministic enforcement layer
Levels of Autonomy — Human oversight model (L3: “Hands Off, Eyes On”)

See also:

Industry Alignment — External voices converging on ASDLC principles
Spec-Driven Development — ASDLC’s production-grade methodology
Context Gates — Deterministic enforcement layer

YAML

A human-readable data serialization language that serves as the structured specification format for configuration, schemas, and file structures in agentic workflows.

Status: Live | Last Updated: 2026-01-13

Definition

YAML (YAML Ain’t Markup Language) is a human-readable data serialization language designed for configuration files, data exchange, and structured documentation. In agentic development, YAML serves as the specification language for data structures, schemas, and file organization.

Where Gherkin specifies behavior (Given-When-Then), YAML specifies structure (keys, values, hierarchies). Both are human-readable formats that bridge the gap between human intent and machine execution.

Key Characteristics

Human-Readable Structure

YAML’s indentation-based syntax mirrors how humans naturally organize hierarchical information:

notification:
  channels:
    - websocket
    - email
    - sms
  constraints:
    latency_ms: 100
    retry_count: 3
  fallback:
    enabled: true
    order: [websocket, email, sms]

Schema-First Design

YAML enables schema-first development where data structures are defined before implementation:

# Schema definition in spec
user:
  id: string (UUID)
  email: string (email format)
  roles: array of enum [admin, user, guest]
  created_at: datetime (ISO 8601)

Agents can validate implementations against these schemas, catching type mismatches and missing fields before runtime.

Configuration as Code

YAML configurations live in version control alongside code, enabling:

Diff visibility — Configuration changes appear in PRs
Review process — Same rigor as code changes
History tracking — Git blame shows who changed what and when

ASDLC Usage

YAML serves as the data structure specification language in ASDLC, completing the specification triad:

Gherkin — Specifies behavior (what happens)
YAML — Specifies structure (what exists)
Mermaid — Specifies process (how it flows)

In Specs: All ASDLC articles use YAML frontmatter for structured metadata. The Spec pattern leverages YAML for schema definitions that agents validate against.

In AGENTS.md: The AGENTS.md Specification uses YAML for structured directives—project context, constraints, and preferred patterns.

Applied in:

The Spec — Frontmatter and schema definitions
AGENTS.md Specification — Agent configuration
Context Engineering — Structured context formats

Patterns (A-Z)

Adversarial Code Review

Consensus verification pattern using a secondary Critic Agent to review Builder Agent output against the Spec.

Status: Live | Last Updated: 2026-01-31

Definition

Adversarial Code Review is a verification pattern where a distinct AI session—the Critic Agent—reviews code produced by the Builder Agent against the Spec before human review.

This extends the Critic (Hostile Agent) pattern from the design phase into the implementation phase, creating a verification checkpoint that breaks the “echo chamber” where a model validates its own output.

The Builder Agent (optimized for speed and syntax) generates code. The Critic Agent (optimized for reasoning and logic) attempts to reject it based on spec violations.

The Problem: Self-Validation Ineffectiveness

LLMs are probabilistic text generators trained to be helpful. When asked “Check your work,” a model that just generated code will often:

Hallucinate correctness — Confidently affirm that buggy logic is correct because it matches the plausible pattern in training data.

Double down on errors — Explain why the bug is actually a feature, reinforcing the original mistake.

Share context blindness — Miss gaps because it operates within the same context window and reasoning path that produced the original output.

If the same computational session writes and reviews code, the “review” provides minimal independent validation.

The Solution: Separated Roles (and Parallel Critique)

To create effective verification, separate the generation and critique roles. Advanced implementations also utilize parallel multi-model critique to find overlapping issues before synthesizing the results.

The Builder — Optimizes for implementation throughput (e.g., Gemini 3 Flash, Claude Haiku 4.5). Generates code from the PBI and Spec.

The Critic Lanes — A set of independent models (e.g., an illustrative “Tri-Model Lane” approach with independent Architect, SecOps, and QA personas) optimized for specific validation dimensions. Models must have strict Provenance identity separation so their actions are audited independently.

The Critics do not generate alternative implementations. They act as gatekeepers, producing either PASS or a list of spec violations that must be addressed.

The Workflow

1. Build Phase

The Builder Agent implements the PBI according to the Spec.

Output: Code changes, implementation notes.

Example: “Updated auth.ts to support OAuth login flow.”

2. Context Swap (Fresh Eyes)

Critical: Start a new AI session or chat thread for critique. This clears conversation drift and forces the Critic to evaluate only the artifacts (Spec + Diff), not the Builder’s reasoning process.

If using the same model, close the current chat and open a fresh session. If using Model Routing, switch to High Reasoning models for parallel critique.

3. Critique Phase

Feed the Spec and the code diff to the Critic Agents with adversarial framing. Advanced factories run these in parallel lanes using specialized prompts (for example, the Architect persona below):

System Prompt (Architect Critic Example):

You are a rigorous Code Reviewer validating implementation against contracts.

Input:
- Spec: specs/auth-system.md
- Code Changes: src/auth.ts (diff)

Task:
Compare the code strictly against the Spec's Blueprint (constraints) and Contract (quality criteria).

Identify:
1. Spec violations (missing requirements, violated constraints)
2. Security issues (injection vulnerabilities, auth bypasses)
3. Edge cases not handled (error paths, race conditions)
4. Anti-patterns explicitly forbidden in the Spec

Output Format:
- PASS (if no violations)
- For each violation, provide:
  1. Violation Description (what contract was broken)
  2. Impact Analysis (why this matters: performance, security, maintainability)
  3. Remediation Path (ordered list of fixes, prefer standard patterns, escalate if needed)
  4. Test Requirements (what tests would prevent regression)

This transforms critique from "reject" to "here's how to fix it."

3b. Moderator Synthesis (For Parallel Critique)

When a pattern incorporates multiple parallel Critics, a Moderator role becomes an architectural requirement to prevent alert fatigue and conflicting directives.

The essential shape of this architecture structurally separates the read-only analysis (performed by the parallel Critics) from the synthesis and write actions (performed exclusively by the Moderator). The Moderator acts as a deduplication and prioritization layer, ensuring the Builder agent receives a single, unified checklist of violations rather than a barrage of uncoordinated feedback.

4. Verdict

If PASS (or resolved by Synthesis): Code moves to human Acceptance Gate (L3 review for strategic fit).

If FAIL: Violations are fed back to Builder as a new task: “Address these spec violations before proceeding.”

This creates a Context Gate between code generation and human review.

Relationship to Context Gates

Adversarial Code Review implements a Review Gate as defined in Context Gates:

Quality Gates (deterministic) — Verify syntax, compilation, linting, test passage.

Review Gates (probabilistic, adversarial) — Verify semantic correctness, spec compliance, architectural consistency. This is where Adversarial Code Review operates.

Acceptance Gates (subjective, HITL) — Verify strategic fit and product vision alignment.

The Critic sits between automated tooling and human review, catching issues that compilers miss but that don’t require human strategic judgment.

Integration with Model Routing

Use Model Routing to assign models by capability profile:

Role	Model Profile	Rationale
Builder	High Throughput	Fast code generation with strong syntax knowledge
Critic(s)	High Reasoning	Deep logic evaluation, constraint satisfaction, edge case discovery. Can be broken into specialized lanes (Arch, QA, SecOps).

This leverages the strengths of each model class: speed for generation, reasoning depth for validation.

Strategic Value

Reduces L3 Cognitive Load — Human reviewers focus on “Is this the right product?” rather than catching spec deviations or missing error handling.

Catches Regression to Mediocrity — Coding models gravitate toward average solutions. The Critic enforces novelty and architectural intent from the Spec.

Enforces Spec Quality — If the Critic can’t determine whether code is correct, the Spec is ambiguous. This surfaces specification gaps.

Prevents Silent Failures — The Critic catches implementation shortcuts (skipped validation, missing edge cases) that pass tests but violate contracts.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation validated this pattern’s effectiveness:

Context: A user story required filtering audit logs by date range. The Builder Agent implemented the requirement, tests passed, and the code compiled without errors.

Issue Detected: The Critic Agent identified a silent performance violation:

// Implementation passed all Quality Gates but violated architectural constraint
var logs = await repository.LoadAll(); // Loads entire table into memory
return logs.Where(log => log.Date > startDate); // Filters in-memory

Critic Output:

VIOLATION: Performance - Data Access Pattern

Spec requires database-level filtering for datasets exceeding 1k records.
Implementation loads full table then filters in-memory.

Impact: Works with small data; fails at scale (10k+ records cause memory issues)

Remediation Path:
1. Push filtering to database: repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this filter, use raw SQL
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in the repository interface

Key Learnings:

Silent Performance Risks — Code that passes all tests can still violate architectural constraints. The Critic caught the LoadAll().Filter() anti-pattern before production.
Iterative Refinement — The Critic initially flagged “missing E2E tests,” which were actually present but structured differently. The team updated the Critic’s instructions to recognize the project’s test architecture, demonstrating the pattern’s adaptability.
Tone Calibration — Using “Approve with suggestions” framing prevented blocking valid code while surfacing genuine risks. The Critic didn’t reject the PR—it flagged optimization opportunities with clear remediation paths.

This validates the pattern’s core thesis: adversarial review catches architectural violations that pass deterministic checks but violate semantic contracts.

Example: The Silent Performance Bug

Spec Contract: “All database retries must use exponential backoff to prevent thundering herd during outages.”

Builder Output: Clean code with a simple retry loop using fixed 1-second delays. Tests pass.

// src/db.ts
async function queryWithRetry(sql: string) {
  for (let i = 0; i < 5; i++) {
    try {
      return await db.query(sql);
    } catch (err) {
      await sleep(1000); // Fixed delay
    }
  }
}

Critic Response:

VIOLATION: src/db.ts Line 45

Spec requires exponential backoff. Implementation uses constant sleep(1000).

Impact: During database outages, this will cause thundering herd problems
as all clients retry simultaneously.

Required: Implement delay = baseDelay * (2 ** attemptNumber)

Without the Critic, a human skimming the PR might miss the constant delay. The automated tests wouldn’t catch it (the code works). The Critic, reading against the contract, identifies the violation.

Implementation Constraints

Not Automated (Yet) — As of December 2025, this requires manual orchestration. Engineers must manually switch sessions/models and feed context to the Critic.

Context Window Limits — Large diffs may exceed even Massive Context models. Use Context Gates filtering to provide only changed files + relevant Spec sections.

Critic Needs Clear Contracts — The Critic can only enforce what’s documented in the Spec. Vague specs produce vague critiques.

Model Capability Variance — Not all “reasoning” models perform equally at code review. Validate your model’s performance on representative examples.

Relationship to Agent Constitution

The Agent Constitution defines behavioral directives for agents. For Adversarial Code Review:

Builder Constitution: “Implement the Spec’s contracts. Prioritize clarity and correctness over cleverness.”

Critic Constitution: “You are skeptical. Your job is to reject code that violates the Spec, even if it ‘works.’ Favor false positives over false negatives.”

This frames the Critic’s role as adversarial by design—it’s explicitly told to be rigorous and skeptical, counterbalancing the Builder’s helpfulness bias.

Future Automation Potential

This pattern is currently manual but has clear automation paths:

CI/CD Integration — Run Critic automatically on PR creation, posting violations as review comments.

IDE Integration — Real-time critique as code is written, similar to linting but spec-aware.

Multi-Agent Orchestration — Automated handoff between Builder and Critic until PASS is achieved.

Programmatic Orchestration (Workflow as Code)

To scale this pattern, move from manual prompt-pasting to code-based orchestration (e.g., using the Claude Code SDK).

Convention-Based Loading: Store reviewer agent prompts in a standard directory (e.g., .claude/agents/) and load them dynamically:

// Load the specific reviewer agent
const reviewerPrompt = await fs.readFile(`.claude/agents/${agentName}.md`);

// Spawn subagent via SDK
const reviewResult = await claude.query({
  prompt: reviewerPrompt,
  context: { spec, diff },
  outputFormat: { type: 'json_schema', schema: ReviewSchema }
});

This allows you to treat Critic Agents as standardized, version-controlled functions in your build pipeline.

As agent orchestration tooling matures, this pattern may move from Experimental to Standard.

See also:

Context Gates — The architectural checkpoint pattern this implements
The Spec — The source of truth the Critic validates against
Model Routing — How to assign different models to Builder and Critic roles
Agentic Double Diamond — The design-phase Critic pattern this extends
Adversarial Requirement Review — The upstream verification pattern for problem definitions
Agent Constitution — How to frame Critic behavior as adversarial

Agentic SDLC — The Verification phase where this pattern operates
Levels of Autonomy — L3 autonomy requires verification before human review

Agent Constitution

Persistent, high-level directives that shape agent behavior and decision-making before action.

Status: Live | Last Updated: 2026-02-18

Definition

An Agent Constitution is a set of high-level principles or “Prime Directives” injected into an agent’s system prompt to align its intent and behavior with system goals.

The concept originates from Anthropic’s Constitutional AI research, which proposed training models to be “Helpful, Honest, and Harmless” (HHH) using a written constitution rather than human labels alone. In the ASDLC, we adapt this alignment technique to System Prompt Engineering—using the Constitution to define the “Superego” of our coding agents.

The Problem: Infinite Flexibility

Without a Constitution, an Agent is purely probabilistic. It will optimize for being “helpful” to the immediate prompt user, often sacrificing long-term system integrity.

If a prompt says “Implement this fast,” a helpful agent might skip tests. A Constitutional Agent would refuse: “I cannot skip tests because Principle #3 forbids merging unverified code.”

The Solution: Proactive Behavioral Alignment

The Constitution shapes agent behavior before action occurs—unlike reactive mechanisms (tests, gates) that catch problems after the fact.

The Driver Training Analogy

To understand the difference between a Constitution and other control mechanisms, consider the analogy of driving a car:

The Spec: The Destination. “Drive to 123 Main St.”
Context Gates: The Brakes/Guardrails. Hard limits that stop the car if it’s about to hit a wall (e.g., “Stop if compilation fails”). These are reactive.
Agent Constitution: The Driver Training. The internalized rules (“Drive defensively,” “Yield to pedestrians”) that shape how the driver steers before any danger arises. This is proactive.

The “Orient” Phase

In the OODA Loop (Observe-Orient-Decide-Act), the Constitution lives squarely in the Orient phase.

When an agent Observes the world (reads code, sees a user request), the Constitution acts as a filter for how it interprets those observations.

A “Helpful” Constitution might interpret a vague request as an opportunity to guess and assist.
A “Skeptical” Constitution might interpret the same vague request as a risk to be flagged.

Taxonomy: Steering vs. Deterministic Constraints

It is critical to distinguish what the Constitution can enforce (Steering) from what external systems enforce deterministically (Hard). Hard constraints split into two distinct categories:

1. Steering Constraints (Probabilistic)

Live in the system prompt / agents.md. Influence the model’s reasoning, tone, and risk preference. The agent self-polices these — they are probabilistic, not guaranteed.

“Ask before guessing on ambiguous specs.”
“Explain your plan before writing code.”
“Prefer composition over inheritance.”

2. Toolchain Constraints (Deterministic — Repo)

Live in tool configuration files (biome.json, tsconfig, .golangci.yml, ESLint, etc.). Enforced by the toolchain on every run, regardless of agent behavior. The tool is the enforcement mechanism — not the agent.

No var in TypeScript → tsconfig / Biome
Import order → ESLint / Biome
Type errors → tsconfig strict mode
Formatting → Prettier / Biome

Restating Toolchain Constraints in agents.md is an antipattern. It implies the agent is the enforcement mechanism when it is not, and research shows agents will follow these instructions faithfully — adding reasoning cost and broader exploration without improving outcomes (Gloaguen et al., 2026).

3. Orchestration Constraints (Deterministic — Runtime)

Live in the runtime environment (hooks, CI pipelines, Docker containers, API limits). Physically prevent the agent from taking restricted actions.

Cannot push without passing automated tests
Cannot access production database credentials
Cannot access files outside /src

The Decision Rule

Before adding any rule to agents.md, ask: can a tool or runtime already enforce this?

Can a linter/formatter enforce it?  → put it in tool config, not agents.md
Can a CI gate enforce it?           → put it in the pipeline, not agents.md
Can a hook enforce it?              → put it in the hook, not agents.md
None of the above?                  → agents.md is the right home

The Constitution is for the judgment layer — the things that require reasoning to uphold. Everything else has a more reliable home.

Anatomy of a Constitution

Research into effective system prompts suggests a constitution should have four distinct components:

1. Identity (The Persona)

Who is the agent? This prunes the search space of the model (e.g., “You are a Senior Rust Engineer” vs “You are a poetic assistant”).

See Agent Personas

2. The Mission (Objectives)

What is the agent trying to achieve?

Example: “Your goal is to maximize code maintainability, even at the cost of slight verbosity.”

3. The Boundaries (Negative Constraints)

What must the agent never do? These are “Soft Gates”—instructions to avoid bad paths before hitting the hard Context Gates.

Example: “Never output code that swallows errors. Never use var in TypeScript.”

4. The Process (Step-by-Step)

How should the agent think? This enforces Chain-of-Thought reasoning.

Example: “Before writing code, listing the files you intend to modify. Then, explain your plan.”

Constitution vs. Spec

A common failure mode is mixing functional requirements with behavioral guidelines. Separation is critical:

Feature	Agent Constitution	The Spec
Scope	Global / Persona-wide	Local / Task-specific
Lifespan	Persistent (Project Lifecycle)	Ephemeral (Feature Lifecycle)
Content	Values, Style, Ethics, Safety	Logic, Data Structures, Routes
Example	”Prioritize Type Safety over Brevity."	"User `id` must be a UUID.”

Self-Correction Loop

One of the most powerful applications of a Constitution is the Critique-and-Refine loop (derived from Anthropic’s Supervised Learning phase):

Draft: Agent generates a response to the user’s task.
Critique: Agent (or a separate Critic agent) compares the draft against the Constitution.
Refine: Agent rewrites the draft to address the critique.

This allows the agent to fix violations (e.g., “I used any type, but the Constitution forbids it”) before the user ever sees the code.

Periodic Auditing

As the toolchain evolves (dependency upgrades, new linter rules, stricter tsconfig), previously necessary Constitution rules may become redundant. Auditing agents.md for toolchain-redundant rules should be part of dependency upgrade reviews.

Relationship to Other Patterns

Constitutional Review — The pattern for using a Critic agent to review code specifically against the Agent Constitution.

Context Gates — The deterministic checks that back up the probabilistic Constitution. Hard Constraints implemented via orchestration.

Adversarial Code Review — Uses persona-specific Constitutions (Builder vs Critic) to create dialectic review processes.

The Spec — Defines task-specific requirements, while the Constitution defines global behavioral guidelines.

AGENTS.md Specification — The practice for documenting and maintaining your Agent Constitution.

Workflow as Code — Implements Hard Constraints programmatically, complementing the Constitution’s Steering Constraints.

Agent Optimization Loop

The recursive process of using feedback from scenarios to continuously tune agent prompts, context, and tools.

Status: Experimental | Last Updated: 2026-02-21

Definition

The Agent Optimization Loop is the distinct lifecycle for building the agents themselves, separate from the lifecycle of the software they build. It replaces static “Evals” with dynamic Scenarios—realistic, localized integration tests that verify agent behavior in context.

While the Ralph Loop optimizes the product (Code) through iteration, the Agent Optimization Loop optimizes the producer (Agent) through meta-feedback.

The Problem: Static Evals

Standard “Leaderboard” evaluations (GSM8K, HumanEval) measure raw intelligence, not job performance. Optimizing for them leads to overfitting on generic tasks while failing on domain-specific constraints.

In a Software Factory, we need agents that perform specifically well on our codebase, our patterns, and our constraints. A generic coding agent that knows Python well but ignores our project’s Result type pattern is functionally broken.

The Solution: The Factory Loop

The Agent Optimization Loop treats the agent’s configuration (System Prompt, Context, Tools) as the source code, and “Scenarios” as the unit tests.

Anatomy

The loop consists of three phases: Seed, Validate, and Loop.

1. Seed (Context Engineering)

The initial configuration of the agent. This includes:

System Prompt: The persona and behavioral constraints.
Context: The verified knowledge base (docs, specs).
Tools: The capabilities exposed to the agent.

2. Validate (Scenarios)

Instead of running the agent on a generic problem, we run it against a Scenario—a specific, representative task from our actual backlog.

Scenario: “Refactor user.ts to use Zod schema validation.”
Pass Criteria:
- Code compiles.
- Tests pass.
- Linter is satisfied.
- Architectural check: Did it use the correct Zod patterns?

3. Loop (Meta-Optimization)

When the agent fails a scenario, we do not just fix the code (that’s the Ralph Loop). We fix the Agent.

Diagnosis: Why did the agent fail?
- Did it miss a rule? -> Update System Prompt.
- Did it lack knowledge? -> Update Context (add a doc).
- Did it hallucinate a tool? -> Fix Tool definition.

This creates a compounding asset: an agent that gets smarter about this specific codebase over time.

Probabilistic Satisfaction & Holdouts

In mature setups (such as an AI Software Factory), evaluation shifts from boolean definitions of success (“the test suite is green”) to empirical Probabilistic Satisfaction. Agents are evaluated against thousands of Holdout Scenarios—simulated user stories explicitly hidden from the agent during implementation. This prevents the agent from overfitting or “cheating” the tests, ensuring generalized competence.

Offline vs Online Evolution

The Agent Optimization Loop manifests in two distinct modes:

Offline Factory Optimization (Current Focus)

Optimization occurs asynchronously through explicit integration testing (Scenarios). Humans or meta-evaluators analyze failures and update the version-controlled context (Specs, AGENTS.md) and rerun. This guarantees determinism and peer review but has higher latency.

Online Context Evolution (Experimental)

Often called “Continual Learning in Token Space,” where an agent natively reflects over its past trajectories (e.g., distilling sessions into an AGENTS.md update or generating a new skill file automatically). While this enables rapid adaptation, it risks uncontrolled drift if the agent infers the wrong lesson from a failure.

In ASDLC, we treat Online Evolution as an input to Offline Optimization: agents can suggest updates to the context, but these updates must pass deterministic Architectural Review before becoming canonical.

Relationship to Other Patterns

Ralph Loop — The Execution Loop. The Agent Optimization Loop runs offline to improve the agent so that the Ralph Loop runs more efficiently online.

Context Engineering — The discipline that informs the “Seed” and “Loop” phases. The Optimization Loop is the process of verifying that our Context Engineering is effective.

Agentic SDLC — The overarching framework. The Agent Optimization Loop is the engine of the “Agent Factory” component of the ASDLC.

Agentic Double Diamond

A computational framework transforming the classic design thinking model into an executable pipeline of context verification and assembly.

Status: Experimental | Last Updated: 2026-02-25

Definition

The Agentic Double Diamond is a computational framework that transforms the traditional design thinking model (Discover, Define, Develop, Deliver) into an executable pipeline where every phase produces machine-readable context rather than static artifacts.

Agentic Double Diamond Diagram

In this model, the Spec becomes the primary source code, and “Coding” becomes an automated assembly step. The human role shifts from Implementation to Context Engineering and Verification.

The Problem: Lossy Handoffs

Traditional software development suffers from signal degradation at every handoff:

The “Gap of Silence”: Insights from the Discover phase are summarized into PowerPoints or tickets, stripping away the raw evidence needed for edge-case validation.
Static Deliverables: The Define phase produces Figma files or flat requirements. To an AI, these are unstructured blobs. Use of “Vibe Coding” creates functionality that feels right but fails under rigorous scrutiny.
Verification Lag: We typically only verify if we built the thing right (Testing) after weeks of coding. We rarely verify if we are building the right thing (Strategy) until it’s too late.

The result is a “Build Trap” where we efficiently ship features that solve the wrong problems.

The Solution: A Computational Pipeline

The Agentic Double Diamond reimagines the two diamonds not as workshop phases, but as Context Furnaces. Each furnace ingests raw, unstructured input and refines it into a stricter, more deterministic state.

Diamond 1 (Problem Space): Ingests Chaos $\rightarrow$ Refines to Insight.
Diamond 2 (Solution Space): Ingests Insight $\rightarrow$ Refines to Implementation.

Crucially, we introduce Adversarial Gates at the convergence points of each diamond to stop “Solution Pollution”—the tendency to rush into building without a valid problem definition.

Anatomy

The pattern consists of four computational phases and one operational phase (Run).

Phase 1: DISCOVER (The Sensor Network)

From Chaos to Signal.

Instead of manual research sprints, we use agents to ingest broad signals (user feedback, logs, market data) and cluster them into patterns.

Context Output: Problem Graph (A structured map of user needs and pain points).

Practices:
- Experience Modeling: Defining the domain language and user journey.
- Context Engineering: Structuring the raw input for analysis.

Phase 2: DEFINE (The Strategy Engine)

From Signal to Insight.

We crystallize the signals into a coherent strategy. This is where Product Thinking applies constraint satisfaction to select the right problem to solve.

Human Role: Thought Leader (Deciding what matters). Agent Role: Thought Partner (Challenging assumptions).

Context Output: Strategy Document & Validated Problem Statement.

Gate 1 (The Checkpoint): Adversarial Requirement Review.
- Before writing a single line of a Spec, an Adversarial Agent challenges the strategy constraints. If it fails, we loop back to Discover.

Phase 3: SPEC (The New Coding)

From Insight to Blueprint.

This is the most significant shift. In the Agentic SDLC, Spec Writing IS Coding. The Spec is the permanent, living source of truth. It defines the “What” (Behavior) and the “How” (Architecture) in a format rigorous enough for agents to execute.

Context Output: The Spec (Context, Blueprint, Contract).

Practices:
- Spec-Driven Development: The methodology of writing specs first.
- Living Specs: Treating documentation as code.

Phase 4: ASSEMBLE (The Agentic Manufactory)

From Blueprint to Assembly.

Agents ingest the Spec and “assemble” the implementation. This phase is highly automated. The agents generate code, tests, and documentation that adhere strictly to the Spec.

Human Role: Verifier (Reviewing the assembly against the Spec). Agent Role: Builder (Implementation).

Context Output: Source Code, Tests, Micro-Commits.

Gate 2 (The Checkpoint): Adversarial Code Review.
- An independent Critic Agent verifies the assembled code against the Spec’s Contracts. It catches edge cases and architectural violations that unit tests might miss.
- See also: Micro-Commits and Feature Assembly.

Phase 5: RUN (The Feedback Loop)

From Assembly to Signal.

The software operates in production, generating new signals (usage data, errors, feedback) that feed back into Phase 1, closing the loop.

Practices:
- Feedback Loop Compression: Minimizing the time between “Run” and “Discover”.
- Production Readiness Gap: Managing the transition from prototype to production.

Relationship to Other Patterns

Product Thinking: The mindset that drives the Discover/Define phases.
The Spec: The central artifact connecting Define to Assemble.
Agent Constitution: The set of laws that govern agent behavior throughout the pipeline.
Context Gates: The architectural pattern implemented by the Adversarial Reviews.

Anti-Patterns

The Vibe Coding Shortcut

Problem: Skipping the Define and Spec phases to jump straight to Assemble (Vibe Coding). Consequence: Fast “sugar-high” shipping of features that crumble under production complexity because they lack structural integrity.

The Static Spec

Problem: Treating Phase 3 as a “PDF generation” step. Consequence: The Spec drifts from reality immediately. In this pattern, the Spec must be a Living Spec in the repo, or the automated assembly fails.

Industry Implementations

The Agentic Double Diamond is a theoretical model that maps closely to emerging industry practices. A concrete example is the Effective Delivery AI-driven framework, which observed a 30-40% development speed increase using a 4-phase “copilot-collections” workflow that tightly aligns with our phases:

Research $\rightarrow$ Discover: Agents build context around a task and source related information.
Plan $\rightarrow$ Define / Spec: Agents create a structured implementation plan with clear acceptance criteria.
Implement $\rightarrow$ Assemble: Specialized engineers (Frontend/Backend) execute against the agreed plan.
Review $\rightarrow$ Assemble (Verification Gate): Critic agents perform structured code reviews, verifying against Figma designs or the implementation plan.

This workflow demonstrates that creating specialized context furnaces (their Research/Plan phases) before implementation leads to measurable, significant gains over standard “vibe coding” with a single LLM prompt.

Constitutional Review

Verification pattern that validates implementation against both functional requirements (Spec) and architectural values (Constitution).

Status: Live | Last Updated: 2026-01-31

Definition

Constitutional Review is a verification pattern that validates code against two distinct contracts:

The Spec (functional requirements) — Does it do what was asked?
The Constitution (architectural values) — Does it do it the right way?

This pattern extends Adversarial Code Review by adding a second validation layer. Code can pass all tests and satisfy the Spec’s functional requirements while still violating the project’s architectural principles documented in the Agent Constitution.

The Problem: Technically Correct But Architecturally Wrong

Standard verification catches functional bugs:

Tests: Does the code produce expected outputs?
Spec Compliance: Does it implement all requirements?
Type Safety: Does it compile without errors?

But code can pass all these checks and still violate architectural constraints:

Example: The Performance Violation

// Spec requirement: "Filter audit logs by date range"
async function getAuditLogs(startDate: Date) {
  const logs = await db.auditLogs.findAll(); // ❌ Loads entire table
  return logs.filter(log => log.date > startDate); // ❌ Filters in memory
}

Quality Gates: ✅ Tests pass (small dataset)
Spec Compliance: ✅ Returns filtered logs
Constitutional Review: ❌ Violates “push filtering to database layer”

The code is functionally correct but architecturally unsound. It works fine with 100 records but fails catastrophically at 10,000+.

The Solution: Dual-Contract Validation

Constitutional Review solves this by validating against two sources of truth:

Traditional Review (Functional)

Input: Spec + Code Diff
Question: “Does the code implement the requirements?”
Validates: Functional correctness

Constitutional Review (Architectural)

Input: Constitution + Spec + Code Diff
Question: “Does the code exhibit our architectural values?”
Validates: Architectural consistency

The Critic Agent validates against BOTH contracts:

Functional correctness (from the Spec)
Architectural consistency (from the Constitution)

Anatomy

Constitutional Review consists of three key components:

The Dual-Contract Input

Spec Contract — Defines functional requirements, API contracts, and data schemas. Answers “what should it do?”

Constitution Contract — Defines architectural patterns, performance constraints, and security rules. Answers “how should it work?”

Both contracts are fed to the Critic Agent for validation.

The Critic Agent

A secondary AI session (ideally using a reasoning-optimized model) that:

Reads both the Spec and the Constitution
Compares implementation against both contracts
Identifies where code satisfies functional requirements but violates architectural principles
Provides structured violation reports with remediation paths

This extends the Adversarial Code Review Critic with constitutional awareness.

The Violation Report

When constitutional violations are detected, the Critic produces:

Violation Description — What constitutional principle was violated
Impact Analysis — Why this matters at scale (performance, security, maintainability)
Remediation Path — Ordered steps to fix (prefer standard patterns, escalate if needed)
Test Requirements — What tests would prevent regression

This transforms review from rejection to guidance.

Relationship to Other Patterns

Adversarial Code Review — The base pattern that Constitutional Review extends. Adds the Constitution as a second validation contract.

Agent Constitution — The source of architectural truth. Defines the “driver training” that shapes initial behavior; Constitutional Review verifies the training was followed.

The Spec — The source of functional truth. Constitutional Review validates against both Spec and Constitution.

Context Gates — Constitutional Review implements a specialized Review Gate that validates architectural consistency.

Feedback Loop: Constitution shapes behavior → Constitutional Review catches violations → Violations inform Constitution updates (if principles aren’t clear enough).

Integration with Context Gates

Constitutional Review implements a specialized Review Gate that sits between Quality Gates and Acceptance Gates:

Gate Type	Question	Validated By
Quality Gates	Does it compile and pass tests?	Toolchain (deterministic)
Spec Review Gate	Does it implement requirements?	Critic Agent (probabilistic)
Constitutional Review Gate	Does it follow principles?	Critic Agent (probabilistic)
Acceptance Gate	Is it the right solution?	Human (subjective)

The Constitutional Review Gate catches architectural violations that pass functional verification.

Strategic Value

Catches “Regression to Mediocrity” — LLMs are trained on average code from the internet. Without constitutional constraints, they gravitate toward common but suboptimal patterns.

Enforces Institutional Knowledge — Architectural decisions (performance patterns, security rules, error handling strategies) are documented once in the Constitution and verified on every implementation.

Surfaces Specification Gaps — If the Critic can’t determine whether code violates constitutional principles, the Constitution needs clarification. This improves the entire system.

Reduces L3 Review Burden — Human reviewers focus on strategic fit (“Is this the right feature?”) rather than catching architectural violations (“Why are you loading the entire table?”).

Prevents Silent Failures — Code that “works” but violates architectural principles (like the LoadAll().Filter() anti-pattern) is caught before production.

Validated in Practice

Case Study: Claudio Lassala (January 2026)

A production implementation caught a constitutional violation that passed all other gates:

Context: User story required filtering audit logs by date range. Builder Agent implemented the requirement, tests passed, code compiled without errors.

Code Behavior:

Loaded entire audit log table into memory
Filtered in-memory using LINQ/collection methods

Gate Results:

Quality Gates: ✅ Passed (compiled, tests passed with small dataset)
Spec Compliance: ✅ Passed (functional requirement met: returns filtered logs)
Constitutional Review: ❌ FAILED (violated “push filtering to database layer”)

Critic Output: Provided specific remediation path:

Push filter to database query layer
If ORM doesn’t support pattern, use raw SQL
Add performance test with 10k+ records
Document constraint in repository interface

Impact: Silent performance bug caught before production. The code worked perfectly in development (small dataset) but would have failed catastrophically at scale.

See full case study in Adversarial Code Review.

Implementing Practice

For step-by-step implementation guidance, see:

Constitutional Review Implementation — How to configure Critic Agent prompts, document architectural constraints, and integrate with your workflow

See also:

Adversarial Code Review — The base pattern this extends
Agent Constitution — The source of architectural truth
The Spec — The source of functional truth
Context Gates — The architectural checkpoint system
Agentic SDLC — The verification phase where this operates
Context Engineering — How to structure constitutional constraints for LLMs

Context Gates

Architectural checkpoints that filter input context and validate output artifacts between phases of work to prevent cognitive overload and ensure system integrity.

Status: Experimental | Last Updated: 2026-01-18

Definition

Context Gates are architectural checkpoints that sit between phases of agentic work. They serve a dual mandate: filtering the input context to prevent cognitive overload, and validating the output artifacts to ensure system integrity.

Unlike “Guardrails,” which conflate prompt engineering with hard constraints, Context Gates are distinct, structural barriers that enforce contracts between agent sessions and phases.

The Problem: Context Pollution and Unvalidated Outputs

Without architectural checkpoints, agentic systems suffer from two critical failures:

Context Pollution — Agents accumulate massive conversation histories (observations, tool outputs, internal monologues, errors). When transitioning between sessions or tasks, feeding the entire context creates cognitive overload. Signal-to-noise ratio drops, and agents lose focus on the current objective—Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side.”

Unvalidated Outputs — Code that passes automated tests can still violate semantic contracts (spec requirements, architectural constraints, security policies). Without probabilistic validation layers, implementation shortcuts and silent failures slip through to production.

Why Existing Approaches Fail:

Single-pass validation (tests only) misses semantic violations
No context compression between sessions creates confusion
Flat quality gates don’t distinguish deterministic checks from probabilistic review

The Solution: Dual-Mandate Checkpoint Architecture

Context Gates solve this by creating two distinct checkpoint types:

Input Gates — Filter and compress context entering an agent session, ensuring only relevant information is presented. This prevents cognitive overload and maintains task focus.

Output Gates — Validate artifacts leaving an agent session through three tiers of verification: deterministic checks, probabilistic review, and human acceptance.

The key insight: Context must be controlled at the boundaries, not throughout execution. Agents work freely within their session, but transitions enforce strict contracts.

Anatomy

Context Gates consist of two primary structures, each with distinct sub-components:

Input Gates

Input Gates control what context enters an agent session.

Summary Gates (Cross-Session Transfer)

When transitioning work between agent sessions, Summary Gates compress conversation history into essential state.

Type: LLM-Assisted Summarization
Nature: Compression / Filtering
Function: Extract key decisions, discard intermediate reasoning
Outcome: Clean handoff without context overflow

Examples:

Design session → Implementation: Extract design decisions, discard exploration paths
Bug investigation → Fix: Compress to “root cause + attempted fixes”
Code review → Revision: Distill to actionable feedback list

Context Filtering (Within-Session)

During multi-step tasks within a single session, Context Filtering determines what historical information is relevant to the current sub-task.

Type: Semantic Search / Lightweight Agent
Nature: Relevance Filtering
Function: High signal-to-noise ratio for current decision
Outcome: Precision and low latency

Output Gates

Output Gates validate artifacts before they progress to the next phase. Three tiers enforce different types of correctness:

Quality Gates (Deterministic)

Binary, automated checks enforced by the toolchain.

Type: Machine / Toolchain
Nature: Deterministic (Pass/Fail)
Question: “Does it compile and pass tests?”
Enforcement: Instant rejection if failed; often triggers self-correction

Examples:

Syntax & type safety (TypeScript compilation, Zod validation)
Linting rules (ESLint, accessibility checks)
Unit/E2E test passage
Build artifact generation

Review Gates (Probabilistic, Adversarial)

LLM-assisted validation of semantic correctness and contract compliance.

Type: Secondary AI Session (Critic Agent)
Nature: Probabilistic / Adversarial
Question: “Does it satisfy the Spec’s contracts?”
Implementation: Adversarial Code Review, Constitutional Review

Examples:

Spec compliance (all requirements implemented)
Anti-pattern detection (architectural constraint violations)
Edge case coverage (error paths, race conditions)
Security review (injection vulnerabilities, auth bypasses)

Output Format: When violations are detected, Review Gates provide actionable feedback:

Violation Description — What contract was broken
Impact Analysis — Why this matters (performance, security, maintainability)
Remediation Path — Ordered list of fixes (prefer standard patterns, escalate if needed)
Test Requirements — What tests would prevent regression

This transforms Review Gates from “reject” mechanisms into “guide to resolution” checkpoints.

Acceptance Gates (Human-in-the-Loop)

Subjective checks requiring human strategic judgment.

Type: Human Review (HITL)
Nature: Subjective / Strategic
Question: “Is it the right thing?”
Purpose: Ensure solution solves actual user problem and aligns with product vision. Empowers the human to act as a Change Owner rather than a syntax auditor.

Examples:

Brand tone check (does copy sound like us?)
UX review (does interaction feel smooth?)
Visual QA (are spacings and layout visually balanced?)
Strategic fit (does this feature solve the user’s problem?)

Workflow Enforcement (Denial Gates)

Mechanisms that actively block agents from bypassing the defined process.

Type: Client-Side Hook / Middleware
Nature: Prevention
Question: “Are you allowed to do this?”
Enforcement: Hard block (exit code 1)

Examples:

git push hook: “Blocked. You must use the submit-pr tool which runs verification first.”
Shell wrapper: Prevents agent from accessing production database credentials.

Gate Taxonomy

Feature	Summary Gates (Input)	Context Filtering (Input)	Quality Gates (Output)	Review Gates (Output)	Acceptance Gates (Output)
Function	Session handoff	Within-session filtering	Code validity	Spec compliance	Strategic fit
Goal	Clean session transfer	Maintain focus	Prevent broken code	Enforce contracts	Prevent bad product
Mechanism	LLM Summarization	Semantic Search	Compilers / Tests	LLM Critique	Human Review
Nature	Compression	Filtering	Deterministic	Probabilistic	Subjective
Outcome	Condensed context	Clean context window	Valid compilation	Spec compliance	Approved release

Relationship to Other Patterns

Adversarial Code Review — Implements the Review Gate tier of Output Gates. Uses a Critic Agent to validate code against the Spec’s contracts.

Constitutional Review — Extends Review Gates by validating against both the Spec (functional) and the Agent Constitution (architectural values).

Model Routing — Works with Context Gates to assign appropriate model capabilities to different gate types (throughput models for generation, reasoning models for Review Gates).

The Spec — Provides the contract that Review Gates validate against.

Agent Constitution — Provides architectural constraints that Constitutional Review validates against.

Ralph Loop — Applies Context Gates at iteration boundaries, using context rotation and progress files to prevent cognitive overload across autonomous loops.

Feature Assembly — The practice that uses all three Output Gates (Quality, Review, Acceptance) in the verification pipeline.

Workflow as Code — The practice for implementing gate enforcement programmatically rather than via prompt instructions.

Strategic Value

Prevents Context Overload — Agents receive only relevant information, maintaining task focus and reducing token usage.

Catches Semantic Violations — Review Gates detect contract violations that pass deterministic checks (performance anti-patterns, security gaps, missing edge cases).

Reduces Human Review Burden — Quality and Review Gates filter out obvious errors, letting humans focus on strategic fit rather than technical correctness.

Enforces Architectural Consistency — Constitutional Review (via Review Gates) ensures code follows project principles, not just internet-average patterns.

Creates Clear Contracts — Each gate type has explicit pass/fail criteria, making verification deterministic where possible and explicit where probabilistic.

See also:

Adversarial Code Review — Review Gate implementation
Constitutional Review — Dual-contract Review Gate
The Spec — Contract source for Review Gates
Agent Constitution — Architectural constraint source
Model Routing — Model selection for different gate types

Agentic SDLC — The lifecycle where gates create phase boundaries
Context Engineering — The practice of structuring context
Guardrails — Disambiguated term this pattern replaces

Context Map

A high-density navigational index that enables agents to locate knowledge without managing massive context windows.

Status: Experimental | Last Updated: 2026-02-16

Definition

A Context Map is a curated, high-density index of a larger knowledge base (the “Territory”) provided to an agent upfront. It acts as a navigational aid, allowing the agent to locate specific information or understand the system’s topology without ingesting the entire corpus or relying on blind search.

Instead of hoping an agent “finds” the right context through tool calls, the Context Map guarantees the agent knows what exists and where it resides.

The Problem: The Haystack Failure

Agents operating on large codebases or documentation sets face two failure modes:

Context Overload: Feeding all documentation into the context window is expensive, slow, and typically exceeds token limits.
Search Blindness: Letting agents search “on demand” (RAG) is unreliable. Vercel’s research shows agents often fail to invoke search tools or craft poor queries, leading to a 79% success rate compared to 100% with a map.

The result is Hallucination by Omission: The agent invents a believable but incorrect solution because it failed to retrieve the authoritative documentation.

The Solution: The Map is Not the Territory

The Context Map pattern separates Navigation from Ingestion.

We provide the agent with a Map—a highly compressed, structural representation of the available knowledge. The Map contains:

Topology: What concepts exist and how they relate.
Signposts: Key terminology and definitions.
Pointers: References to the detailed files (The Territory).

Critically, The Map is small enough to fit permanently in the context window, while The Territory is loaded only on demand.

Anatomy

A Context Map consists of three layers of increasing density:

1. The Index (Topology)

A structural overview of the domain.

Role: Tells the agent what exists.
Format: Typically an Annotated YAML tree or a File System Table.

2. The Glossary (Signposts)

A definition of domain-specific terms to prevent vocabulary mismatch.

Role: Tells the agent what to call things.
Example: Defining “Gate” so the agent doesn’t search for “Guardrail”.

3. The Routing Table (Pointers)

Explicit links between problems and their authoritative sources.

Role: Tells the agent where to look.
Example: “For database schema questions, see src/db/schema.ts.”

The Ecosystem: Where does it fit?

Scope	Component	Role	Example
Discipline	Context Engineering	The “Physics”. The study of how to structure info.	The Concept
Pattern	Context Map	The Strategy. “Use a Map to find the Territory.”	This Pattern
Practice	Context Mapping	The Tactics. “How to write the Map in YAML.”	The How-To Guide
Container	AGENTS.md	The Implementation. The file where the Map lives.	The Spec

Context Map is the Spatial pillar of Context Engineering.

The Spec is the Temporal pillar (What/When).
The Constitution is the Governance pillar (How).

Implementation Strategies

See the Context Mapping practice for detailed implementation guides.

1. The Pragmatic Map (YAML)

Using Annotated YAML to describe project structure and internal documentation. This is preferred for its readability and standard structure which LLMs assume naturally.

2. The Compressed Map (Vercel Style)

Using a high-density Pipe-Delimited format (|path/to/file:{doc1,doc2}) to map massive external documentation sets (e.g., framework docs) where token efficiency is paramount.

Experience Modeling

The practice of treating the Design System as a formal schema that agents must strictly follow, preventing UI hallucinations.

Status: Live | Last Updated: 2026-01-25

Definition

Experience Modeling is the creation of a queryable Experience Schema—a rigid Design System that serves as the source of truth for all frontend generation.

Just as we model data schemas (SQL/Prisma) to constrain backend agents, Experience Modeling restricts frontend agents to a validated set of UI components, tokens, and layouts. It treats the Design System not as a library of suggestions, but as a strict contract.

The Problem: Design Drift

Without a formal Experience Model, agents suffer from Design Drift—the gradual divergence of a product’s UI from its intended design specifications.

This occurs because LLMs are probabilistic “vibe engines.” When asked to “make a blue button,” an agent might:

Generate raw CSS (background-color: #007bff) instead of using tokens (var(--color-primary))
Hallucinate new component variants that don’t exist
Inconsistently apply spacing and typography

Over hundreds of commits, these micro-inconsistencies accumulate into a codebase that is technically functional but visually chaotic and impossible to maintain.

The Solution: The Experience Schema

The solution is to formalize the UI as an Experience Schema—a strict, machine-readable definition of valid UI states.

Instead of asking the agent to “design a page,” we force it to “assemble a page using only these approved blocks.” This shifts the agent’s role from Artist (creating new styles) to Builder (assembling pre-built parts).

Anatomy

1. The Component Catalog (The Vocabulary)

The “words” the agent is allowed to use. This is a set of dumb, stateless UI components (Buttons, Inputs, Cards) that strictly enforce brand styles. These components must be:

Self-Contained: Encapsulate all styling logic.
Typed: Export clear TypeScript interfaces.
Documented: exposed via llms.txt or similar context files.

2. The Context Gate (The Enforcer)

A mechanical barrier between Experience Modeling and Feature Assembly.

%% caption: Context Gating for Design System Integrity
flowchart LR
  A[[...]] --> |CONTEXT| C
  C[EXPERIENCE MODELING] --> D
  D{GATE} --> E
  E[FEATURE ASSEMBLY]
    E --> |DEFECT/REQUIREMENT SIGNAL| C
    E --> |RELEASE| G
  G[[...]]

Context Gating for Design System Integrity

The gate verifies:

Token Strictness: No raw CSS values (hex codes, magic numbers).
Schema Parity: Documentation matches code.
Build Success: The Design System builds in isolation.

3. Read-Only Enforcement (The Governance)

During Feature Assembly, the Experience Model must be Read-Only. Agents cannot modify the definition of a “Button” to make a feature work; they must use the Button as it exists or request a change to the model.

Pattern A: Hard Isolation (Enterprise) The Design System is a separate package (NPM/NuGet) installed as a dependency. The agent literally cannot modify source files because they are in node_modules.

Pattern B: Toolchain Enforcement (Startups) The Design System lives in the same repo, but pre-commit hooks or CODEOWNERS files prevent the agent from modifying src/design-system/** without explicit human override.

Relationship to Other Patterns

Context Gates — Experience Modeling implements a specific type of Context Gate: the “Design Integrity Gate.”

Feature Assembly — The phase that consumes the Experience Model. Feature Agents assume the Experience Model is immutable context.

Agent Personas — We often use a specific “Systems Architect” or “Designer” persona for the Experience Modeling phase, distinct from the “Feature Developer” persona.

External Attention

External Attention offloads document processing to isolated sub-agents, returning only extracted answers to the main agent's context window.

Status: Draft | Last Updated: 2026-01-10

Definition

External Attention is an architectural pattern for offloading document processing to isolated sub-agents rather than injecting documents into the main agent’s context window. The sub-agent queries the document and returns only the extracted answer.

This pattern addresses a fundamental tension in agentic systems: agents often need information from large documents (PDFs, codebases, research papers), but loading those documents directly into context degrades performance on the primary task.

The Problem: Context Bloat from Large Documents

When agents need information from large documents, the naive approach loads the document into context. This creates:

Context pollution — Document tokens compete with task-relevant tokens
Cognitive overload — Agent loses focus on the current objective
Token waste — Paying for full document when only a fraction is relevant

The Solution: Query, Don’t Load

Instead of:

Context = [Task Instructions] + [Full Document] + [Recent Actions]

Use:

Context = [Task Instructions] + [Query Result] + [Recent Actions]

Where Query Result comes from a specialized sub-agent that:

Receives the document + specific question
Extracts only the relevant answer
Returns a bounded response to the main agent

The key insight: isolation preserves focus. The main agent’s context remains clean while the sub-agent handles the messy work of document comprehension.

Anatomy

External Attention consists of four components:

Document Ingestion Tool

A tool interface that accepts a document reference and a query. The main agent sees only the tool signature, not the document contents.

answer = answer_from_pdf(
    document="research-paper.pdf",
    query="What is the reported accuracy on benchmark X?"
)

Sub-Agent Context

An isolated context window where the full document is loaded alongside the query. This context is invisible to the main agent—it exists only for the duration of the tool call.

Query Processor

The sub-agent logic that:

Parses the document
Locates relevant sections
Extracts the specific answer
Formats a bounded response

Bounded Response Contract

The interface guaranteeing that only the extracted answer (not the full document) returns to the main agent. This is the critical boundary that prevents context pollution.

Relationship to Other Patterns

Model Routing — External Attention is a form of model routing where document processing routes to a specialized “reader” agent.

Context Gates — The tool boundary acts as a Context Gate, filtering document contents to only relevant extractions.

Levels of Autonomy — The document-processing sub-agent is an L1 Atomic Agent with a single responsibility.

Practice: External Document Processing — Implementation guidance TBD.

When to Use

Processing documents larger than ~10% of context window
Querying specific facts from large codebases
Literature review tasks requiring many papers
Any task where document contents would dilute task focus

When Not to Use

Documents small enough to fit comfortably in context
Tasks requiring holistic document understanding (not point queries)
Situations where query formulation is unclear upfront

Industry Validation

The InfiAgent framework (Yu et al., 2026) demonstrates this pattern at scale: an 80-paper literature review task where the main agent maintains bounded context by delegating all document reading to answer_from_pdf tools. The approach enabled 80/80 paper coverage where baseline agents failed.

Model Routing

Strategic assignment of LLM models to SDLC phases based on reasoning capability versus execution speed.

Status: Live | Last Updated: 2026-01-31

Definition

Model Routing is the strategic assignment of different Large Language Models (LLMs) to different phases or tasks based on their capability profile.

In a monolithic architecture, a user asking for a simple boolean definition incurs the same high cost and latency as a user requesting a complex strategic analysis. Model routing rationalizes this by shifting model selection from a design-time decision to a runtime optimization problem.

The Iron Triangle

Effective routing systems operate by manipulating the trade-offs between three competing constraints:

Quality: Semantic accuracy, reasoning depth, instruction following.
Cost: Operational expenditure (OpEx) per token.
Latency: Time-To-First-Token (TTFT) and total generation time.

By dynamically swapping models, routers decouple these variables. A system can achieve “frontier-class” average quality at “efficient-class” average cost by routing only the most difficult 10-20% of queries to the expensive model.

Taxonomy of Routing Architectures

We identify five primary patterns for implementing model routing:

1. Semantic Routing (Embedding-Based)

Uses vector similarity to map broad intents to specific routes.

Mechanism: Encoder $\rightarrow$ Vector Search $\rightarrow$ Threshold Check.
Use Case: RAG topic selection, intent classification.

2. Predictive Routing (Classifier-Based)

Uses a trained classifier (Bert, XGBoost, or Matrix Factorization like RouteLLM) to predict the probability that a weak model can successfully answer the query.

Mechanism: P(Success|WeakModel) > Threshold ? Weak : Strong.
Use Case: General purpose query optimization.

3. Cascading Routing (Waterfall)

A “fail-up” pattern that prioritizes cost.

Mechanism: Try Weak Model $\rightarrow$ Validation Gate (Low Confidence?) $\rightarrow$ Strong Model.
Advanced Mechanism (Escalation): Train/prompt the Weak Model (SLM) to actively recognize stalled states or uncertainty and explicitly request guidance from the Strong Model (as demonstrated by SWE-Protégé).
Use Case: Code generation where syntax errors can trigger escalation, or sequential workflows where the primary model is an SLM.

4. Probabilistic Routing (Contextual Bandits)

Uses Reinforcement Learning to adapt routing weights based on user feedback or judge evaluation.

Use Case: High-scale production systems with drifting query distributions.

5. Agentic Routing (Tool Use)

Structural routing where a dispatcher agent utilizes tools to delegate work.

Mechanism: LLM outputs structured JSON choice (e.g., {"tool": "sql_agent"}). This includes Self-Escalation where an agent uses a tool to route the task back to an overwatch or expert model.
Use Case: Complex multi-step workflows.

Anatomy

A complete routing system consists of three components:

1. The Model Registry

A configuration defining the available models and their capabilities.

Strong/Frontier: High reasoning, expensive (e.g., Claude 3.5 Sonnet, GPT-4o, DeepSeek V3).
Weak/Efficient: High speed, cheap (e.g., Haiku, Llama-3-8B, GPT-4o-mini).
Specialist: Domain-optimized (e.g., StarCoder for SQL, Med-PaLM).

2. The Router (Gateway vs. Application)

Gateway Layer: Centralized proxy (e.g., LiteLLM, Cloudflare AI Gateway). Handles auth, rate limits, and simple rule-based routing.
Application Layer: Library-based logic (e.g., LangChain RunnableBranch). Handles logic requiring deep context (session history, variable state).

3. The Calibration

The specific thresholds or weights used to make decisions. These must be tuned against a “Preference Dataset” (pairs of queries and optimal model choices).

Operational Economics

The Sweet Spot

LLMs excel at:

High ambiguity tasks requiring interpretation
Generation of novel content
Format/language transformation

Use deterministic code for:

Hot paths requiring <100ms response
High-volume operations
Binary correctness (auth, financial calculations)

Anti-Patterns

The Monolith

Description: Reliance on a single “Frontier” model for all tasks. Consequence: Excessive cost and latency for simple tasks; inability to scale.

Silent Drift

Description: Hard-coded routing rules (e.g., “if length > 50”) that degrade as user behavior changes. Consequence: Routing becomes incorrectly optimized, sending hard queries to weak models. Fix: Use probabilistic routing or periodic recalibration.

Context Stuffing

Description: Overloading a single prompt with instructions instead of routing to specialized tools/agents. Consequence: “Lost in the Middle” phenomenon; higher hallucination rates.

Trade-offs

Dimension	Implications
Latency Overhead	The router itself adds latency (20-50ms for embeddings, 200ms+ for LLM routers). If the weak model saves 300ms but the router takes 400ms, you have negative ROI.
Complexity	Maintaining a router adds a control plane that can fail. It requires monitoring and dataset maintenance.
Consistency	Using multiple models can lead to inconsistent “tone” or formatting across a user session.

Relationship to Levels of Autonomy

Levels of Autonomy define human oversight requirements. Model Routing matches computational capability to task characteristics:

Complex architectural decisions (L3) $\rightarrow$ High Reasoning models
Well-specified implementation tasks (L3) $\rightarrow$ High Throughput models
Exploratory analysis (L2) $\rightarrow$ Massive Context models

Applied in:

Agentic SDLC — Optimization of the factory floor.
Adversarial Code Review — using different models for Builder vs Critic.

Product Vision

A structured vision document that transmits product taste and point-of-view to agents, preventing convergence toward generic outputs.

Status: Live | Last Updated: 2026-01-13

Definition

A Product Vision is a structured artifact that captures the taste, personality, and point-of-view that makes a product this product rather than generic software. It transmits product intuition to agents who otherwise default to bland, safe, interchangeable outputs.

Traditional vision documents are written for humans—investors, executives, new hires. In ASDLC, the Product Vision is structured for agent consumption, providing the context needed to make opinionated decisions aligned with product identity.

The Problem: Vibe Convergence

Agents trained on the entire internet converge toward the mean. Ask for a landing page, you get the same hero section everyone else gets. Ask for onboarding, you get the same three-step wizard. Ask for error copy, you get “Oops! Something went wrong.”

This isn’t a bug in the model. It’s the model doing exactly what it’s trained to do: produce the statistically average response. The average is safe. The average is forgettable.

The symptoms:

Every feature spec reads like it was written for a different product
UI suggestions feel “correct” but lifeless
Copy has no voice—it could belong to any company
Agents optimize for conventional patterns over product-appropriate patterns
Design decisions lack opinion

The Agent Constitution tells agents how to behave. The Spec tells agents what to build. Neither tells agents who we are.

The Solution: Structured Taste Transmission

The Product Vision bridges this gap by making product identity explicit and agent-consumable. Rather than hoping agents infer taste from scattered references, the vision provides a structured context packet that shapes output quality.

The key insight: agents don’t need complete documentation—they need curated opinions. A Product Vision isn’t comprehensive; it’s opinionated. It tells agents which tradeoffs to make when specs are ambiguous.

Anatomy

A Product Vision consists of five components, each serving a distinct purpose in shaping agent output.

1. The Actual Humans

Not “users” or “customers”—real people with context, constraints, and taste of their own. This gives agents a person to design for, not an abstraction.

When choosing between “simple onboarding wizard” and “power-user defaults with optional setup,” agents need basis for judgment. Abstract personas don’t provide this; descriptions of actual humans do.

2. Point of View

Opinions. Actual stances on tradeoffs that reasonable people might disagree with.

These aren’t requirements—they’re taste. They tell agents which direction to lean when specs are ambiguous:

Dense information vs progressive disclosure
Keyboard-first vs mouse-first
Weird but memorable vs safe but forgettable
Ship incomplete but useful vs complete but late

3. Taste References

Concrete examples of products that feel right, and products that don’t. Agents can reference these patterns directly: “Make this feel more like Linear’s approach to lists, less like Jira’s.”

References provide calibration. Instead of describing “clean” in abstract terms, point to products that embody it—and products that don’t.

4. Voice and Language

How the product speaks. Not brand guidelines—actual examples of tone.

This includes:

What we say vs what we don’t say
Error message patterns
Formality level
Personality markers (or deliberate lack thereof)

5. Decision Heuristics

When agents face ambiguous choices, what should they optimize for? These are tie-breakers—the rules that resolve conflicts between equally valid approaches.

Placement in Context Hierarchy

Product Vision sits between the Constitution and the Specs:

Tier	Artifact	Purpose
Constitution	`AGENTS.md`	How agents behave (rules, constraints)
Vision	`VISION.md` or inline	Who the product is (taste, voice, POV)
Specs	`/plans/*.md`	What to build (contracts, criteria)
Reference	`/docs/`	Full documentation, API specs, guides

The Constitution shapes behavior. The Vision shapes judgment. The Specs shape output.

Not every project needs a separate VISION.md. For smaller products or early-stage teams, the vision can live as a preamble in AGENTS.md. For complex products with detailed voice guidelines and taste references, a separate file prevents crowding out operational context.

See Product Vision Authoring for guidance on the inline vs. separate decision, templates, and maintenance practices.

Validated in Practice

Industry Validation

Marty Cagan (Silicon Valley Product Group) In the AI era, Cagan argues that product vision is more critical than ever. As AI lowers the cost of building features, differentiation shifts from “ability to ship” to “ability to solve value risks.” Without a strong vision, AI teams build “features that work” rather than “products that matter.”

“It will be easier to build features, but harder to build the right features.” — Marty Cagan

Lenny Rachitsky (Product Sense) Rachitsky defines “product sense” as the ability to consistently craft products with intended impact. VISION.md is essentially codified product sense—explicitly documenting the intuition that senior PMs use to steer teams, so that agents (who lack intuition) can simulate it.

The Scientific Basis: Countering Regression to the Mean

LLMs are probabilistic engines trained to predict the most likely next token. By definition, “most likely” means “most average.”

Without external constraint, an agent will always drift toward the Regression to the Mean. A Product Vision acts as a forcing function, artificially skewing the probability distribution toward specific, non-average choices (e.g., “playful” over “professional,” “dense” over “simple”).

Anti-Patterns

The Generic Vision

“User-centric design. Quality and reliability. Innovation and creativity.”

This says nothing. Every company claims these values. A Product Vision without opinions is just corporate filler that agents will (correctly) ignore.

The Aspirational Vision

Describing the product you wish you had, not the product you’re building. If your vision says “minimal and focused” but your product has 47 settings screens, agents will be confused by the contradiction.

The Ignored Vision

Creating the document once and never referencing it in specs or prompts. The artifact exists but agents never see it in context.

The Aesthetic-Only Vision

All visual preferences, no product opinion. “We like blue and sans-serif fonts” isn’t vision—it’s a style guide. Vision captures judgment, not just appearance.

Relationship to Other Patterns

Agent Constitution — The Constitution defines behavioral rules (what agents must/must not do). The Vision defines taste (what agents should prefer when rules don’t dictate). Constitution is constraints; Vision is guidance.

The Spec — Specs define feature contracts. The Vision influences how those contracts are fulfilled. Specs reference Vision for design rationale: “Per VISION.md: ‘Settings are failure; good defaults are success.’”

Context Engineering — The Vision is a structured context asset. It follows Context Engineering principles: curated, opinionated, agent-optimized.

Product Vision Authoring — Step-by-step guide for creating and maintaining a Product Vision, including templates, inline vs. separate file decisions, and diagnostic guidance.

AGENTS.md Specification — Defines the file format for agent constitutions, including how to incorporate vision as a preamble or reference.

Living Specs — Specs can reference vision for design rationale. The “same-commit rule” applies: if vision changes, affected specs should acknowledge the shift.

Agent Personas — Different personas may need different vision depth. A copywriting agent needs full voice guidance; a database migration agent needs minimal product context.

See also:

Agent Constitution — Behavioral alignment pattern
The Spec — Feature contract pattern
AGENTS.md Specification — Constitution implementation practice
Vibe Coding — The failure mode when neither vision nor specs constrain agent output

Ralph Loop

Persistence pattern enabling autonomous agent iteration until external verification passes, treating failure as feedback rather than termination.

Status: Live | Last Updated: 2026-02-21

Definition

The Ralph Loop—named by Geoffrey Huntley after the persistently confused but undeterred Simpsons character Ralph Wiggum—is a persistence pattern that turns AI coding agents into autonomous, self-correcting workers.

The pattern operationalizes the OODA Loop for terminal-based agents and automates the Learning Loop with machine-verifiable completion criteria. It enables sustained L3-L4 autonomy—“AFK coding” where the developer initiates and returns to find committed changes.

flowchart LR
    subgraph Input
        PBI["PBI / Spec"]
    end
    
    subgraph "Human-in-the-Loop (L1-L2)"
        DEV["Dev + Copilot"]
        E2E["E2E Tests"]
        DEV --> E2E
    end
    
    subgraph "Ralph Loop (L3-L4)"
        AGENT["Agent Iteration"]
        VERIFY["External Verification"]
        AGENT --> VERIFY
        VERIFY -->|"Fail"| AGENT
    end
    
    subgraph Output
        REVIEW["Adversarial Review"]
        MERGE["Merge"]
        REVIEW --> MERGE
    end
    
    PBI --> DEV
    PBI --> AGENT
    E2E --> REVIEW
    VERIFY -->|"Pass"| REVIEW

Both lanes start from the same well-structured PBI/Spec and converge at Adversarial Review. The Ralph Loop lane operates autonomously, with human oversight at review boundaries rather than every iteration.

[!WARNING] The “100 Million Lines” Anti-Pattern

Ralph Loop enables persistence, not quality. Using Ralph Loop for unbounded code generation without specs produces what Dan Cripe calls “100 million lines of crappy code”—technically functional but architecturally incoherent and unmaintainable.

Ralph Loop is a persistence mechanism, not a development methodology. It must be bounded by:

Exit criteria defined in The Spec

Verification gates that check architectural coherence, not just compilation

Scope limits that prevent unbounded iteration

The Problem: Human-in-the-Loop Bottleneck

Traditional AI-assisted development creates a productivity ceiling: the human reviews every output before proceeding. This makes the human the slow component in an otherwise high-speed system.

The naive solution—trusting the agent’s self-assessment—fails because LLMs confidently approve their own broken code. Research demonstrates that self-correction is only reliable with objective external feedback. Without it, the agent becomes a “mimicry engine” that hallucinates success.

Aspect	Traditional AI Interaction	Failure Mode
Execution Model	Single-pass (one-shot)	Limited by human availability
Failure Response	Process termination or manual re-prompt	Blocks on human attention
Verification	Human review of every output	Human becomes bottleneck

The Solution: External Verification Loop

The Ralph Loop inverts the quality control model: instead of treating LLM failures as terminal states requiring human intervention, it engineers failure as diagnostic data. The agent iterates until external verification (not self-assessment) confirms success.

Core insight: Define the “finish line” through machine-verifiable tests, then let the agent iterate toward that finish line autonomously. Iteration beats perfection.

Aspect	Traditional AI	Ralph Loop
Execution Model	Single-pass	Continuous multi-cycle
Failure Response	Manual re-prompt	Automatic feedback injection
Persistence Layer	Context window	File system + Git history
Verification	Human review	External tooling (Docker, Jest, tsc)
Objective	Immediate correctness	Eventual convergence

Anatomy

1. Stop Hooks and Exit Interception

The agent attempts to exit when it believes it’s done. A Stop hook intercepts the exit and evaluates current state against success criteria. If the agent hasn’t produced a specific “completion promise” (e.g., <promise>DONE</promise>), the hook blocks exit and re-injects the original prompt.

This creates a self-referential loop: the agent confronts its previous work, analyzes why the task remains incomplete, and attempts a new approach.

2. External Verification (Generator/Judge Separation)

The agent is not considered finished when it believes it’s done—only when external verification confirms success:

Evaluation Type	Agent Logic	External Tooling
Self-Assessment	”I believe this is correct”	None (Subjective)
External Verification	”I will run docker build”	Docker Engine (Objective)
Exit Decision	LLM decides to stop	System stops because tests pass

This is the architectural enforcement of Generator/Judge separation from Adversarial Code Review, but mechanized.

3. Git as Persistent Memory

Context windows rot, but Git history persists. Each iteration commits changes, so subsequent iterations “see” modifications from previous attempts. The codebase becomes the source of truth, not the conversation.

Git also enables easy rollback if an iteration degrades quality.

4. Context Rotation and Progress Files

Context rot: Accumulation of error logs and irrelevant history degrades LLM reasoning.

Solution: At 60-80% context capacity, trigger forced rotation to fresh context. Essential state carries over via structured progress files:

Summary of tasks completed
Failed approaches (to avoid repeating)
Architectural decisions to maintain
Files intentionally modified

This is the functional equivalent of free() for LLM memory—applied Context Engineering.

5. Convergence Through Iteration

The probability of successful completion P(C) is a function of iterations n:

P(C) = 1 - (1 - p_success)^n

As n increases (often up to 50 iterations), probability of handling complex bugs approaches 1.

6. Map-Reduce (Initializer + Sub-Agents)

For inherently parallel tasks or massive operations, a single Ralph Loop iterating sequentially becomes a bottleneck.

The Solution: The Initializer + Sub-Agents pattern.

Initializer Agent: Performs the Discover/Define phase, establishing a central progress tracker or plan file (e.g., a file listing 50 database migrations to perform).
Sub-Agents: The Initializer delegates chunks of the plan to isolated sub-agents. Each sub-agent runs its own Ralph Loop on a specific task with a highly focused, isolated context window.
Coordination: Progress is merged via git history, orchestrated by a multi-agent harness or merge queue.

This pattern limits context bloat by isolating the action space. The fast sub-agents execute tightly scoped tasks, while the Initializer maintains the strategic overview.

OODA Loop Mapping

The Ralph Loop is OODA mechanized:

OODA Phase	Ralph Loop Implementation
Observe	Read codebase state, error logs, failed builds
Orient	Marshal context, interpret errors, read progress file
Decide	Formulate specific plan for next iteration
Act	Modify files, run tests, commit changes

The cycle repeats until external verification passes.

Relationship to Other Patterns

Context Gates — Context rotation + progress files = state filtering between iterations. Ralph Loops are Context Gates applied to the iteration boundary.

Adversarial Code Review — Ralph architecturally enforces Generator/Judge separation. External tooling is the “Judge” that prevents self-assessment failure.

The Spec — Completion promises require machine-verifiable success criteria. Well-structured Specs with Gherkin scenarios are ideal Ralph inputs.

Workflow as Code — The practice for implementing Ralph Loops using typed step abstractions rather than prompt-based orchestration. Provides deterministic control flow with the agent invoked only for probabilistic tasks.

Anti-Patterns

Anti-Pattern	Description	Failure Mode
Vague Prompts	”Improve this codebase” without specific criteria	Divergence; endless superficial changes
No External Verification	Relying on agent self-assessment	Self-Assessment Trap; hallucinates success
No Iteration Caps	Running without max iterations limit	Infinite loops; runaway API costs
No Sandbox Isolation	Agent has access to sensitive host files	Security breach; SSH keys, cookies exposed
No Context Rotation	Letting context window fill without rotation	Context rot; degraded reasoning
No Progress Files	Fresh iterations re-discover completed work	Wasted tokens; repeated mistakes

Unbounded Generation

Running Ralph Loop without scope constraints produces volume without value. Each iteration may “fix” the immediate error while introducing architectural drift. Over time, the codebase becomes:

Internally inconsistent: Different modules make different assumptions
Unmaintainable: No human understands the full system
Expensive to verify: Review time exceeds generation time

Missing Architectural Verification

Ralph Loop’s default exit criteria (tests pass, compilation succeeds) don’t verify architectural coherence. A loop that only checks “does it work?” will happily generate code that violates design patterns, duplicates logic, or introduces subtle inconsistencies.

Mitigation: Combine Ralph Loop with Constitutional Review to verify outputs against architectural principles, not just functional requirements.

Guardrails

Risk	Mitigation
Infinite Looping	Hard iteration caps (20-50 iterations)
Context Rot	Periodic rotation at 60-80% capacity
Security Breach	Sandbox isolation (Docker, WSL)
Token Waste	Exact completion promise requirements
Logic Drift	Frequent Git commits each iteration
Cost Overrun	API cost tracking per session

Spec Reversing

Using frontier models to derive specifications from existing code to bootstrap the Agentic SDLC in brownfield projects.

Status: Experimental | Last Updated: 2026-02-04

The Void: Missing Truth

The Agentic SDLC relies on the Spec as the source of truth. However, most real-world projects are “brownfield”—they have code but no up-to-date documentation. This creates a “Void” where agents have no context to ground their work, leading to regression loops and hallucinated requirements.

Spec Reversing bridges this gap by treating the current codebase as the de facto truth—but only temporarily.

The Pattern

Spec Reversing is a bootstrapping workflow. Instead of writing a spec from scratch, we use a frontier model (like Claude 3.5 Sonnet or GPT-4o) to “read” the code and “write” the missing spec.

The workflow follows this loop:

Select Scope: Identify the specific file or component you are about to modify.
Reverse: Feed the code to a frontier model in Architect or Planning mode.
- Prompt: “Reverse engineer a functional specification from this code. Capture the intent, logic, and edge cases.”
Review: A human (you) reviews the generated spec.
- Critique: “Is this actually what we want? Or just what the code currently does?”
- Correct: Fix any bugs in the logic (in the spec) before touching the code.
Commit: Save this as a new Spec file (e.g., specs/feature-name.md).
Execute: Now create your PBI based on this new Spec.

When to Use

Before a PBI: Never start a PBI without a Spec. If one doesn’t exist, reverse it first.
Legacy Refactoring: When touching “ancient” code that no one understands.
Drift Detection: When you suspect the current documentation is lying. Reverse the code and compare it to the old docs.

Directives

Don’t Trust the Code Blindly: The code might contain bugs. The reversed spec will document those bugs as features. It is your job during the Review phase to decide if those are intended.
Keep it High-Level: Don’t just narrate the code (“variable x is assigned 5”). Describe the behavior (“The retry limit is set to 5”).
One File at a Time: Don’t try to reverse the entire repository. Reverse only the Spec you need for the Task at hand.

Benefits

Stops Context Amnesia: Creates a permanent memory of how the system works.
Enables Agent Autonomy: Agents can now reason about the system before acting.
Safe Refactoring: You have a baseline “contract” to test against.

Specs

Living documents that serve as the permanent source of truth for features, solving the context amnesia problem in agentic development.

Status: Live | Last Updated: 2026-01-13

Definition

A Spec is the permanent source of truth for a feature. It defines how the system works (Design) and how we know it works (Quality).

Unlike traditional tech specs or PRDs that are “fire and forget,” specs are living documents. They reside in the repository alongside the code and evolve with every change to the feature.

Crucially, The Spec pattern adheres to a spec-anchored philosophy. The spec defines the architectural intent and boundaries, but deterministic code remains the ultimate source of truth for runtime logic. Attempting to use a spec as the absolute only source artifact (spec-as-source) to 100% generate a codebase is an anti-pattern that sacrifices the agent control loop and regresses to Model-Driven Development failures.

The Economy of Code

“Talk is cheap. Show me the code.” — Linus Torvalds, 2000

In the AI era, this economic reality has flipped. Code is cheap. Show me the talk.

Generating 10,000 lines of code is now effectively free. The high-value activity is no longer typing semantics, but articulating intent. The Spec is that articulation—the “Expensive Talk” that directs the cheap labor of code generation. Without a Spec, you have infinite Provenance-free code (“slop”).

The Problem: Context Amnesia

Agents do not have long-term memory. They cannot recall Jira tickets from six months ago or Slack conversations about architectural decisions. When an agent is tasked with modifying a feature, it needs immediate access to:

The architectural decisions that shaped the feature
The constraints that must not be violated
The quality criteria that define success

Without specs, agents reverse-engineer intent from code comments and commit messages—a process prone to hallucination and architectural drift.

Traditional documentation fails because:

Wikis decay — separate systems fall out of sync with code
Tickets disappear — issue trackers capture deltas (changes), not state (current rules)
Comments lie — code comments describe implementation, not architectural intent
Memory fails — tribal knowledge evaporates when team members leave

Specs solve this by making documentation a first-class citizen in the codebase, subject to the same version control and review processes as the code itself.

State vs Delta

This is the core distinction that makes agentic development work at scale.

Dimension	The Spec	The PBI
Purpose	Define the State (how it works)	Define the Delta (what changes)
Lifespan	Permanent (lives with the code)	Transient (closed after merge)
Scope	Feature-level rules	Task-level instructions
Audience	Architects, Agents (Reference)	Agents, Developers (Execution)

The Spec defines the current state of the system:

“All notifications must deliver within 100ms”
“API must handle 1000 req/sec”

The PBI defines the change:

“Add SMS fallback to notification system”
“Optimize database query for search endpoint”

The PBI references the Spec for context and updates the Spec when it changes contracts.

Why Separation Matters

Sprint 1: PBI-101 "Build notification system"
  → Creates /plans/notifications/spec.md
  → Spec defines: "Deliver within 100ms via WebSocket"

Sprint 3: PBI-203 "Add SMS fallback"
  → Updates spec.md with new transport rules
  → PBI-203 is closed, but the spec persists

Sprint 8: PBI-420 "Refactor notification queue"
  → Agent reads spec.md, sees all rules still apply
  → Refactoring preserves all documented contracts

Without this separation, the agent in Sprint 8 has no visibility into decisions made in Sprint 1.

The Assembly Model

Specs serve as the context source for Feature Assembly. Multiple PBIs reference the same spec, and the spec’s contracts are verified at quality gates.

flowchart LR
  A[/spec.md/]

  B[\pbi-101.md\]
  C[\pbi-203.md\]
  D[\pbi-420.md\]

  B1[[FEATURE ASSEMBLY]]
  C1[[FEATURE ASSEMBLY]]
  D1[[FEATURE ASSEMBLY]]

  E{GATE}

  F[[MIGRATION]]

  A --> B
  A --> C
  A --> D

  B --> B1
  C --> C1
  D --> D1

  B1 --> E
  C1 --> E
  D1 --> E

  A --> |Context|E

  E --> F

Anatomy

Every spec consists of two parts:

Blueprint (Design)

Defines implementation constraints that prevent agents from hallucinating invalid architectures.

Context — Why does this feature exist?
Architecture — API contracts, schemas, dependency directions
Anti-Patterns — What agents must NOT do

Contract (Quality)

Defines verification rules that exist independently of any specific task.

Definition of Done — Observable success criteria
Regression Guardrails — Invariants that must never break
Scenarios — Gherkin-style behavioral specifications

The Contract section implements Behavior-Driven Development principles: scenarios define what behavior is expected without dictating how to implement it. This allows agents to interpret intent dynamically while providing clear verification criteria.

For detailed structure, examples, and templates, see the Living Specs Practice Guide.

Relationship to Other Patterns

The PBI — PBIs are the transient execution units (Delta) that reference specs for context. When a PBI changes contracts, it updates the spec in the same commit.

Feature Assembly — Specs define the acceptance criteria verified during assembly. The diagram above shows this flow.

Experience Modeling — Experience models capture user journeys; specs capture the technical contracts that implement those journeys.

Context Engineering — Specs are structured context assets optimized for agent consumption, with predictable sections (Blueprint, Contract) for efficient extraction.

Behavior-Driven Development — BDD provides the methodology for the Contract section. Gherkin scenarios serve as “specifications of behavior” that guide agent reasoning and define acceptance criteria.

Kent Beck critiques spec-driven approaches that assume “you aren’t going to learn anything during implementation.” This is valid—specs are not waterfall artifacts.

The refinement cycle:

Initial Spec — Capture known constraints (API contracts, quality targets, anti-patterns)
Implementation Discovery — Agent or human encounters edge cases, performance issues, or missing requirements
Spec Update — New constraints committed alongside the code that revealed them
Verification — Gate validates implementation against updated spec
Repeat

This is the Learning Loop applied to specs: the spec doesn’t prevent learning—it captures learnings so agents can act on them in future sessions.

“Large Language Models give us great leverage—but they only work if we focus on learning and understanding.” — Unmesh Joshi, via Martin Fowler

Industry Validation

The Spec pattern has emerged independently across the industry under different names. Notably, Rasmus Widing’s Product Requirement Prompt (PRP) methodology defines the same structure: Goal + Why + Success Criteria + Context + Implementation Blueprint + Validation Loop.

His core principles—“Plan before you prompt,” “Context is everything,” “Scope to what the model can reliably do”—mirror ASDLC’s Spec-Driven Development philosophy.

See Product Requirement Prompts for the full mapping and Further Reading for convergent frameworks.

See also:

Living Specs Practice Guide — Implementation instructions, templates, and best practices
Behavior-Driven Development — The methodology behind Contract scenarios
Gherkin — Syntax guidance for writing behavioral specifications

The ADR

A structural pattern for capturing architectural decisions with context, rationale, and consequences in an immutable record.

Status: Live | Last Updated: 2026-01-28

Definition

The ADR (Architecture Decision Record) is a lightweight document pattern for capturing significant architectural decisions. Each ADR records exactly one decision: what was decided, why it was decided, and what consequences follow.

Unlike The Spec which defines the current state of a feature and evolves with the code, an ADR is immutable—it captures a snapshot of thinking at a specific moment. When circumstances change, a new ADR supersedes the old one, preserving the decision history.

The Problem: Decision Amnesia

Architectural knowledge decays rapidly. Six months after a technology choice, teams ask:

“Why did we choose PostgreSQL over MongoDB?”
“Who decided we’d use microservices here?”
“What alternatives were considered for the auth system?”

Without explicit decision records, this context lives only in:

Email threads (unsearchable, often deleted)
Slack conversations (ephemeral, noisy)
Tribal knowledge (leaves when people leave)

For agentic development, this creates a severe problem. An agent refactoring authentication code has no visibility into why Supabase Auth was chosen over Firebase Auth—it may inadvertently violate the constraints that drove the original decision.

The Solution: Immutable Decision Records

ADRs solve decision amnesia by making architectural decisions first-class artifacts in the codebase. Each decision is documented at the moment it’s made, with full context preserved.

docs/adrs/
├── ADR-001-use-postgresql.md
├── ADR-002-supabase-auth.md
├── ADR-003-event-driven-messaging.md
└── ADR-004-svelte-over-react.md        # Supersedes ADR-001 (hypothetical)

The key insight: decisions are immutable, but their status changes. ADR-001 might be “Accepted” for two years, then become “Superseded by ADR-010” when the team migrates databases.

Anatomy

An ADR consists of six sections, each serving a distinct purpose:

1. Title

A short, descriptive name with a unique identifier.

Format: ADR-NNN: Decision Summary

Examples:

ADR-001: Use PostgreSQL for Primary Database
ADR-007: Adopt Event-Driven Architecture for Order Processing
ADR-012: Choose Svelte 5 over React for Interactive Components

2. Status

The lifecycle state of the decision:

Status	Meaning
Proposed	Under discussion, not yet decided
Accepted	Decision made and in effect
Deprecated	No longer recommended but not replaced
Superseded	Replaced by a newer ADR (link to successor)

Example: Status: Superseded by ADR-015

3. Context

The forces and constraints that shaped the decision. This is the why—without it, the decision appears arbitrary.

Include:

Business requirements driving the need
Technical constraints (existing systems, team skills)
Timeline pressures
Non-functional requirements (scale, security, compliance)

Example:

We need real-time collaboration features. The existing polling-based approach creates unacceptable latency (>2s) and server load. The team has experience with PostgreSQL but not MongoDB. We have 3 weeks before the feature deadline.

4. Decision

What was decided. State it clearly and unambiguously.

Format: “We will [do X]” or “We decided to [do X]”

Example:

We will use Supabase Realtime (built on PostgreSQL logical replication) for real-time collaboration features.

5. Consequences

The outcomes of this decision—positive, negative, and neutral. Honesty here is critical. A decision that hides its downsides will be revisited with confusion.

Structure:

Positive: Benefits and capabilities gained
Negative: Trade-offs and limitations accepted
Neutral: Changes that are neither good nor bad

Example:

Positive: Leverages existing PostgreSQL expertise. Real-time updates with <100ms latency. No new database to manage.

Negative: Tied to Supabase SaaS (vendor lock-in). Less flexible query patterns than dedicated real-time databases. Learning curve for PostgreSQL triggers.

Neutral: Requires migration of subscription logic from polling to channels.

6. Alternatives Considered

What other options were evaluated and why they were rejected. This prevents future teams from re-evaluating the same options without understanding the original analysis.

Format: List each alternative with rejection rationale.

Example:

Firebase Realtime Database: Rejected—would require a second database system and doesn’t integrate with existing PostgreSQL data.

Custom WebSocket implementation: Rejected—significant development effort and maintenance burden for real-time infrastructure.

Pusher: Rejected—adds external dependency and per-message costs at scale.

State vs The Spec

The ADR complements The Spec but serves a different purpose:

Dimension	The Spec	The ADR
Purpose	Define how it works now	Record why we decided
Mutability	Living (updated with code)	Immutable (superseded, not edited)
Scope	Feature-level behavior	Architectural choice
Audience	Implementers	Archaeologists, reviewers

A feature Spec might say “Authentication uses Supabase Auth with Magic Link.” The ADR explains why Supabase Auth was chosen over Firebase Auth.

Adversarial Decision Review

The Adversarial Code Review pattern validates code against specs. ADRs need a different review approach—Adversarial Decision Review—that evaluates the decision quality itself.

Critic Agent Prompt

You are reviewing an Architecture Decision Record.

Evaluate:
1. **Context Completeness** — Are the forces and constraints clearly articulated? 
   Could someone unfamiliar with the project understand WHY this decision was needed?

2. **Alternatives Rigor** — Were reasonable alternatives considered? 
   Is each rejection rationale specific (not "too complex" without explanation)?

3. **Consequence Honesty** — Are negative outcomes acknowledged?
   Beware ADRs with only positive consequences—every decision has trade-offs.

4. **Reversibility Clarity** — Is it clear how to undo this decision if needed?
   What would trigger reconsideration?

5. **Scope Discipline** — Does this ADR decide exactly one thing?
   Multiple decisions should be separate ADRs.

Output: ACCEPT or list of concerns with suggested improvements.

This pattern ensures ADRs maintain quality as high-value context for future decisions.

Relationship to Other Patterns

The Spec — Specs define current feature state; ADRs explain the architectural choices that constrain specs. An ADR might mandate “all API routes use REST,” and feature specs implement within that constraint.

Agent Constitution — ADRs can become constitutional rules. “ADR-003: All database migrations must be backward-compatible” may be promoted to an agent constitution constraint that the agent must not violate.

Context Engineering — ADRs are high-value context for agents. Including relevant ADRs in agent context helps prevent accidental violations of past architectural decisions.

Request for Comments — RFCs are proposals that spawn ADRs. An RFC gathers feedback; acceptance creates one or more ADRs.

ADR Authoring — The practice that implements this pattern with templates, lifecycle guidance, and file organization.

The PBI

A transient execution unit that defines the delta (change) while pointing to permanent context (The Spec), optimized for agent consumption.

Status: Live | Last Updated: 2026-01-13

Definition

The Product Backlog Item (PBI) is the unit of execution in the ASDLC. While The Spec defines the State (how the system works), the PBI defines the Delta (the specific change to be made).

In an AI-native workflow, the PBI transforms from a “User Story” (negotiable conversation) into a Prompt (strict directive). The AI has flexibility in how code is written, but the PBI enforces strict boundaries on what is delivered.

The Problem: Ambiguous Work Items

Traditional user stories (“As a user, I want…”) are designed for human negotiation. They assume ongoing dialogue, implicit context, and shared understanding built over time.

Agents don’t negotiate. They execute. A vague story becomes a hallucinated implementation.

What fails without structured PBIs:

Agents interpret scope liberally, touching unrelated code
No clear pointer to authoritative design decisions
Success criteria scattered across conversations
Merge conflicts from parallel agents hitting the same files

The Solution: Pointer, Not Container

The PBI acts as a pointer to permanent context, not a container for the full design. It defines the delta while referencing The Spec for the state.

Dimension	The Spec	The PBI
Purpose	Define the State (how it works)	Define the Delta (what changes)
Lifespan	Permanent (lives with the code)	Transient (closed after merge)
Scope	Feature-level rules	Task-level instructions
Audience	Architects, Agents (Reference)	Agents, Developers (Execution)

Anatomy

An effective PBI consists of four parts:

1. The Directive

What to do, with explicit scope boundaries. Not a request—a constrained instruction.

2. The Context Pointer

Reference to the permanent spec. Prevents the PBI from becoming a stale copy of design decisions that live elsewhere.

3. The Verification Pointer

Link to success criteria defined in the spec’s Contract section. The agent knows exactly what “done” looks like.

Protocol for when reality diverges from the spec. Does the agent stop? Update the spec? Flag for human review?

Bounded Agency

Because AI is probabilistic, it requires freedom to explore the “How” (implementation details, syntax choices). However, to prevent hallucination, we bound this freedom with non-negotiable constraints.

Negotiable (The Path): Code structure, variable naming, internal logic flow, refactoring approaches.

Non-Negotiable (The Guardrails): Steps defined in the PBI, outcome metrics in the Spec, documented anti-patterns, architectural boundaries.

The PBI is not a request for conversation—it’s a constrained optimization problem.

Atomicity & Concurrency

---
title: Spec-flow
---
flowchart LR
  T1 -->|/ralpt| FA1
  T2 -->|/dev| FA2
  T3 --> FA3
  subgraph Spec
    SA([Intent]) -->|/spec| SB[spec.md]
    SB -->|/review| SA
    SB -->|'plan.1'| T1[PBI.1]
    SB -->|'plan.2'| T2[PBI.2]
    SB -->|'plan.3'| T3[PBI.3]
  end
  subgraph Feaure Assembly 1
    FA1[[Ralph loop]] -->
    FA1.1{gates} -->|FAIL|FA1
  end
  subgraph Feaure Assembly 2
    FA2[/develop/] -->|'/review'| FA2.1
    FA2.1[/Adversarial Review/] -->|PASS| FA2.2
    FA2.1 -->|FAIL| FA2
    FA2.2{gates} -->|FAIL| FA2
  end
  subgraph Feaure Assembly 3
    FA3([Craftmanship]) -->
    FA3.1{gates} -->|FAIL|FA3
  end
  FA1.1 -->|PASS|E(((DONE)))
  FA2.2 -->|PASS|E
  FA3.1 -->|PASS|E

In swarm execution (multiple agents working in parallel), each PBI must be:

Atomic: The PBI delivers a complete, working increment. No partial states. If the agent stops mid-task, either the full change lands or nothing does.

Self-Testable: Verification criteria must be executable without other pending PBIs completing first. If PBI-102 requires PBI-101’s code to test, PBI-102 is not self-testable.

Isolated: Changes target distinct files/modules. Two concurrent PBIs modifying the same file create merge conflicts and non-deterministic outcomes.

Dependency Declaration

When a PBI requires another to complete first, the dependency is declared explicitly in the PBI structure—not discovered at merge time.

Relationship to Other Patterns

The Spec — The permanent source of truth that PBIs reference. The Spec defines state; the PBI defines delta.

PBI Authoring — The practice for writing effective PBIs, including templates and lifecycle.

See also:

Spec-Driven Development — The overarching methodology
Context Gates — Validation checkpoints for PBI completion

Practices (A-Z)

ADR Authoring

Step-by-step guide for creating, organizing, and maintaining Architecture Decision Records in your codebase.

Status: Live | Last Updated: 2026-01-28

Definition

ADR Authoring is the practice of writing, organizing, and maintaining Architecture Decision Records throughout a project’s lifecycle. This practice implements The ADR pattern with concrete templates, file organization conventions, and lifecycle management.

Following this practice produces a searchable, version-controlled archive of architectural decisions that serves both humans and agents.

When to Use

Write an ADR when:

Making a technology choice (database, framework, language)
Choosing between architectural approaches (monolith vs microservices, REST vs GraphQL)
Accepting a significant trade-off (performance vs simplicity, vendor lock-in vs flexibility)
Establishing a constraint that will affect future work (“all APIs must be versioned”)
Reverting or superseding a previous decision

Skip an ADR when:

The decision is easily reversible (naming convention for a single file)
The decision is already documented in a feature spec
No reasonable alternatives exist (using HTTP for web requests)
The decision is entirely tactical, not architectural

Process

Step 1: Check for Existing ADRs

Before writing a new ADR, search for existing decisions in the same domain.

# Search for related ADRs
grep -r "database" docs/adrs/

If a relevant ADR exists, you may need to supersede it rather than create a fresh decision.

Step 2: Choose an ID and Title

Assign the next sequential ID and write a clear, descriptive title.

Format: ADR-NNN-short-descriptive-title.md

Filename conventions:

Use lowercase with hyphens
Keep it scannable (3-7 words after the ID)
Start with the domain if helpful: ADR-015-auth-use-supabase.md

Step 3: Document the Context

This is the most important section. Capture the forces that make this decision necessary:

What triggered this decision?
What constraints exist?
What are the non-negotiable requirements?
What is the timeline pressure?

[!TIP] Write context as if explaining to someone joining the team next month. They should understand why this decision was needed, not just what was chosen.

Step 4: State the Decision

Write a clear, unambiguous statement of what was decided.

Good: “We will use PostgreSQL as the primary database for all transactional data.”

Bad: “We decided to maybe consider PostgreSQL or something similar.”

Step 5: Document Consequences

List outcomes honestly—positive, negative, and neutral.

Positive: What capabilities or benefits does this enable?

Negative: What trade-offs are we accepting? What doors does this close?

Neutral: What changes but isn’t inherently good or bad?

[!WARNING] ADRs with no negative consequences are suspicious. Every significant decision has trade-offs. Hiding them leads to confusion when the downsides surface later.

Step 6: Record Alternatives Considered

For each seriously considered alternative, explain why it was rejected. Use specific, concrete reasons—not vague dismissals.

Good: “Firebase Realtime Database rejected—would require a second database system and doesn’t integrate with existing PostgreSQL data models.”

Bad: “Firebase rejected—too complex.”

Step 7: Set Status and Commit

Set status to Proposed for review, or Accepted if the decision is final. Commit the ADR alongside related code changes when possible.

git add docs/adrs/ADR-015-auth-use-supabase.md
git commit -m "docs: add ADR-015 for Supabase Auth decision"

Template

# ADR-NNN: {Title}

**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-XXX

**Date:** YYYY-MM-DD

## Context

{What forces are at play? What problem needs solving? What constraints exist?}

## Decision

{What was decided? State clearly and unambiguously.}

## Consequences

**Positive:**
- {Benefit 1}
- {Benefit 2}

**Negative:**
- {Trade-off 1}
- {Trade-off 2}

**Neutral:**
- {Change 1}

## Alternatives Considered

### {Alternative 1}

{Description and rejection rationale}

### {Alternative 2}

{Description and rejection rationale}

File Organization

Recommended directory structure:

docs/
└── adrs/
    ├── README.md              # Index and search tips
    ├── ADR-001-use-postgres.md
    ├── ADR-002-event-driven.md
    ├── ADR-003-supabase-auth.md
    └── ...

The README.md should include:

Quick reference of active ADRs
Instructions for creating new ADRs
Search tips (grep patterns for common queries)

Lifecycle Management

Status Transitions

stateDiagram-v2
    [*] --> Proposed
    Proposed --> Accepted : Approved
    Proposed --> Rejected : Rejected
    Accepted --> Deprecated : No longer recommended
    Accepted --> Superseded : Replaced by new ADR
    Deprecated --> [*]
    Superseded --> [*]
    Rejected --> [*]

Superseding an ADR

When a decision is replaced:

Create the new ADR with the updated decision
Update the old ADR’s status: **Status:** Superseded by [ADR-NNN](./ADR-NNN-title.md)
Do not delete or modify the content of the superseded ADR

This preserves the archaeological record of how thinking evolved.

Common Mistakes

The Novel

Problem: ADR is 10+ pages, covering multiple decisions.

Solution: Split into multiple ADRs. Each ADR should decide exactly one thing.

The Hidden Trade-off

Problem: Consequences section lists only positives.

Solution: Force yourself to list at least one negative consequence. If you can’t find any, you haven’t thought hard enough.

The Vague Alternative

Problem: Alternatives listed but rejection rationale is “too complex” or “not a good fit.”

Solution: Be specific. What exactly made it complex? What would adopting it have required?

The Orphaned Decision

Problem: ADR created but never reviewed or status remains “Proposed” indefinitely.

Solution: Include ADR review in your PR/code review process. ADRs should move to “Accepted” or “Rejected” within one sprint.

The Missing Context

Problem: Decision makes no sense without knowing the constraints that existed at the time.

Solution: Write context as if for a new team member. Include timeline, budget, team skills, existing systems.

Agentic Integration

ADRs serve as high-value context for agents:

Include in agent context when:

Working on code in the ADR’s domain
Making decisions that might conflict with existing ADRs
Refactoring systems covered by architectural decisions

Agent-friendly practices:

Keep ADRs in a consistent location (docs/adrs/)
Use clear, searchable titles
Tag ADRs with domain keywords in a frontmatter section (optional)

An agent working on authentication should be provided ADR-003-supabase-auth.md as context to avoid accidentally violating the architectural constraints.

This practice implements:

The ADR — The structural pattern this practice executes

See also:

Living Specs — For feature documentation that evolves (unlike immutable ADRs)
Request for Comments — For proposals that spawn multiple ADRs
Context Engineering — ADRs as context sources for agents

Adversarial Code Review

Executing automated verification using a Critic Agent to validate implementation artifacts against Spec contracts.

Status: Live | Last Updated: 2026-01-31

Definition

Adversarial Code Review is the practice of automating code validation by employing a specialized Critic Agent to review claimed implementations against established Spec contracts and the Agent Constitution.

By separating the “Builder” role from the “Critic” role, this practice ensures that verification remains objective and rigorous, catching architectural drifts, security vulnerabilities, and logic errors that might pass standard unit tests.

When to Use

Use this practice to implement a high-reasoning verification gate before human review.

Use this practice when:

A feature implementation is ready for review.
The project maintains clear Specs or CLAUDE.md constitutions.
You are using Model Routing to separate implementation and verification roles.
The risk of “echo-chamber” self-validation is high.

Skip this practice when:

Performing trivial documentation fixes or small refactors with no logic changes.
Validating exploratory “vibe coding” prototypes where specs are not yet defined.

Process

Step 1: Fetch Issue Context

Retrieve the source of truth for the work being reviewed. This typically involves getting details from a project management tool (like Linear) to understand the title, description, and acceptance criteria.

Step 2: Gather Implementation Artifacts

Identify what has changed. Check the git status for uncommitted changes or review recent commits associated with the issue ID. Prepare the diff or the set of modified files for the Critic Agent.

Step 3: Load Contracts

Identify the “laws” the implementation must follow. This includes:

Relevant functional specs in the specs/ or docs/ directory.
The project’s Constitution (often CLAUDE.md), which contains architectural constraints and coding standards.

Step 4: Adversarial Review

Deploy the Critic Agent with an adversarial persona. Instruct the agent to be skeptical by design and to prioritize rejecting violations over being “helpful.” Compare the code strictly against the loaded contracts.

Step 5: Identify Violations & Verdict

Analyze the Critic’s output. If violations are found, categorize them by impact and provide specific remediation paths. If no violations are found against the contracts, issue a PASS verdict.

Templates

Critic Agent Prompt

Use this template to configure a session or subagent for adversarial review.

# Adversarial Code Review

You are a rigorous **Critic Agent** performing adversarial code review per ASDLC.io patterns.

Your role is skeptical by design: reject code that violates the Spec or Constitution, even if it "works." Favor false positives over false negatives.

## Task
Review the implementation claimed for: {issue_id_or_description}

## Workflow
1. **Fetch Context**: Review specs/{spec_name}.md and {constitution_file}.
2. **Review Artifacts**: Analyze the provided code diff/files.
3. **Compare Strictly**: Check against Spec contracts, Security (RLS/Auth), Type safety, and Design system tokens.
4. **Identify Violations**: For each issue, cite the clause violated, the impact, and the remediation path.

## Output Format

### If No Violations Found:
## Verdict: PASS
[Summary of what was reviewed and why it passes]

### If Violations Found:
## Verdict: NOT READY TO MERGE

### Acceptance Criteria Check
| Criterion | Status | Notes |
|-----------|--------|-------|
| {criterion} | {status} | {notes} |

### Violations Found
**1. [Category]: [Brief description]**
- **Violated**: [Spec section or rule]
- **Impact**: [Why this matters]
- **Remediation**: [How to fix]

Common Mistakes

Using the Same Session

Problem: Allowing the Builder Agent to review its own work within the same chat history. Solution: Always start a fresh session or use a distinct subagent with a high-reasoning model for the review.

Vague Violation Reports

Problem: The Critic flags an issue but doesn’t explain why it’s a violation or how to fix it. Solution: Enforce a structured output format that requires citing specific spec clauses and providing remediation steps.

This practice implements:

Adversarial Code Review — The core architectural pattern of separated verification roles.
The Spec — The source of truth used for validation.
Agent Constitution — The set of behavioral and technical constraints enforced during review.

Adversarial Requirement Review

A verification practice where a Critic Agent challenges the problem statement and assumptions before any specification or code is written.

Status: Experimental | Last Updated: 2026-02-12

Adversarial Requirement Review

Definition

Adversarial Requirement Review is a verification practice where a Thought Partner agent (acting as an adversarial critic) challenges the problem statement, underlying assumptions, and strategy before any specification is written or implementation begins.

This shifts the “adversarial” concept left—from reviewing code (Adversarial Code Review) to reviewing the intent itself.

When to Use

Use this practice when:

Validating a new feature idea before writing a Spec.
You have a PBI but suspect the “Why” is weak.
Stakeholders ask for a specific solution (“We need a dashboard”) without explaining the problem.

Skip this practice when:

Fixing bugs (unless the bug reveals a flawed requirement).
Implementing purely technical tasks where the “Why” is established (e.g., library upgrades).

The Problem: The Backwards Approach

In traditional development, and accelerated by AI, we often start with “How do we build X?” rather than “Is X the right problem to solve?”.

The Backwards Problem:

Stakeholder has an idea (“We need a weekly email report”).
Engineer/AI jumps to implementation (“I’ll set up a cron job…”).
Feature ships quickly.
Feature fails because the underlying problem was misunderstood (e.g., users needed real-time data, not weekly snapshots).

AI exacerbates this by making implementation so cheap that we skip validation. We build the wrong thing faster than ever.

The Solution: Thought Partner vs. Leader

To break this cycle, we separate the roles:

Human (Thought Leader): Provides context, judgment, and final decisions.
AI (Thought Partner/Critic): Provides challenges, alternative angles, and stress-testing questioning.

The goal is not for the AI to solve the problem, but to sharpen the problem definition.

The Workflow

This pattern consists of three distinct phases of challenge.

1. The Problem Sharpener

Goal: Clarify the problem statement and remove implied solutions.

Prompt Pattern:

“I’m going to describe a problem I’m trying to solve. I want you to act as a Thought Partner - not to solve it, but to help me understand it better.

After I describe the problem, interview me one question at a time to:

Clarify who exactly is affected and when

Surface barriers I might be glossing over

Identify assumptions I’m making without realizing it

Challenge whether I’ve framed the problem correctly

Don’t suggest solutions. Help me see the problem more clearly.

Here’s the problem: [describe your problem]“

2. The Assumption Surfacer

Goal: Identify risky beliefs that must be true for the strategy to succeed.

Prompt Pattern:

“I’m considering this product strategy: [describe what you’re building and why].

What assumptions am I making that must be true for this to work?

Focus on:

Behavior: Will people actually change their behavior to use it? (Desire != Action)

Value: Is it worth building? Does the value justify the cost?

Alternatives: What am I deprioritizing, and what is the cost of leaving that unsolved?

List 5-7 assumptions, starting with the ones most likely to be wrong.”

3. The Pre-Build Stress Test

Goal: Final pressure test before committing to a Spec or PBI.

Prompt Pattern:

“Before I commit to building this, I want to pressure-test the idea.

Context: [describe what you’re planning to build and the problem it solves]

Act as a skeptical but constructive advisor. Interview me one question at a time to find weaknesses in my thinking. Push back where my reasoning seems thin. Help me discover what I don’t know before I invest in building.”

Integration with ASDLC

This practice operates at the beginning of the second diamond (The Solution Space), acting as the bridge between “Insight” and “Specification”.

Diamond 1 (Discover/Define): Ends with a Problem Graph or validated Insight.
Diamond 2 (Develop/Deliver): Starts with this practice to validate the strategy before writing the Spec.

It ensures that we don’t proceed to Spec-Driven Development with a flawed premise.

Output: The output of this review is a validated Problem Statement and Strategy, which then becomes the “Context” section of your Spec.

Adversarial Code Review: The downstream equivalent. While Requirement Review verifies the Why, Code Review verifies the How.
Spec-Driven Development: This practice ensures the Spec is worth writing.

Agentic Double Diamond — The broader design framework where this practice serves as the gate to the second diamond.
Product Thinking: The mindset that this practice operationalizes.

References

Before I Ask AI to Build, I Ask It to Challenge Author: Daniel Donbavand Published: 2026-02-12 URL: https://danieldonbavand.com/2026/02/12/before-i-ask-ai-to-build-i-ask-it-to-challenge/ Source of the “Problem Sharpener,” “Assumption Surfacer,” and “Pre-Build Stress Test” prompts.

Agent Personas

A guide on how to add multiple personas to an AGENTS.md file, with examples.

Status: Live | Last Updated: 2026-02-18

Definition

Defining clear personas for your agents scopes their work by defining boundaries and focus — not by role-playing. A persona tells the agent what kind of judgment to apply, what to prioritize, and what to hand off. When combined with Model Routing, personas can also specify which model to use for each type of work.

For the full specification of the AGENTS.md file, see the AGENTS.md Specification.

Where Personas Live

Personas are session-scoped, not project-scoped. A Critic persona is irrelevant during implementation work. A Designer persona is irrelevant during triage. Loading all persona definitions on every session wastes context and can actively burden the agent with instructions it won’t use — and research shows agents follow instructions they receive faithfully, whether relevant or not (Gloaguen et al., 2026).

Project type	Where personas live
Single-persona, simple project	Inline identity statement in agents.md is fine
Multi-persona project	Skill/workflow files; agents.md holds registry only
Workflow-triggered persona	Defined as part of the workflow, injected at invocation

The agents.md should contain a persona registry — names and invocation patterns — not full definitions. The full definition (triggers, goals, guidelines) lives in the skill or workflow file that gets injected when that persona is actually needed.

## Personas
Invoke via skill: @Lead, @Dev, @Designer, @Critic
Definitions: `.claude/skills/`

When to Define a Persona

Use a persona when:

You have distinct workflows with genuinely different judgment requirements (e.g., speccing vs. implementation vs. review)
A single generic instruction set produces conflicting behaviors
You need to support adversarial patterns like Adversarial Code Review
You are hitting context limits with a monolithic instruction set

Skip explicit personas when:

The project is simple enough for a single “General Developer” identity
Model selection alone (via Model Routing) handles the variation

Anatomy of a Persona Definition

Each persona definition in a skill file should have four elements:

Trigger — When is this persona active? Goal — What is this persona trying to achieve? Guidelines — What judgment rules apply in this context? Boundaries — What does this persona explicitly not do?

Example: Multi-Persona Skill Files

The following are example skill file contents (not agents.md content):

`.claude/skills/lead.md`

### Lead Developer / Architect (@Lead)
**Trigger:** System design, specs, planning, ADRs.
**Goal:** Specify feature requirements and architecture. Plan next steps. Produce clear specs before handing to implementation.
**Guidelines**
- Schema Design: Define Zod schemas immediately when creating new content types.
- Routing: Use file-based routing. For dynamic docs, use `[...slug].astro` and `getStaticPaths()`.
- Spec-driven: Always produce a clear spec before handoff. Break large tasks into PBIs with acceptance criteria.
- ADR: Record architectural decisions in `docs/adr/` before implementation begins.
**Boundaries**
- Does not write implementation code — hands off to @Dev.
- Does not review finished code — hands off to @Critic.

`.claude/skills/dev.md`

### Developer / Implementation Agent (@Dev)
**Trigger:** Implementation tasks, bug fixes.
**Goal:** Implement features and fix bugs from a defined PBI. Keep the codebase healthy and maintainable.
**Guidelines**
- Always work from a PBI with clear acceptance criteria.
- Type Safety: TypeScript strictly. No `any` types.
- Document progress: Update the relevant PBI in `docs/backlog/` after completing tasks.
- Testing: Ensure all changes pass `pnpm check` and `pnpm lint`.
**Boundaries**
- Does not redesign architecture — flags issues and escalates to @Lead.
- Does not self-approve — hands off to @Critic for review.

`.claude/skills/critic.md`

### Critic / Reviewer (@Critic)
**Trigger:** Code review, constitutional review, pre-merge validation.
**Goal:** Be a skeptical gatekeeper. Assume code is broken or insecure until proven otherwise.
**Guidelines**
- Validate against both The Spec and the Agent Constitution.
- If the spec is vague, reject and demand clarification — do not assume.
- Prioritize correctness and edge-case handling over helpfulness.
- Flag security issues, missing error handling, and type violations explicitly.
**Boundaries**
- Does not fix issues — reports them for @Dev to address.
- Does not approve if any Tier 1 boundary from the Constitution is violated.

agents.md: Registry Only

In agents.md, the persona section should be minimal:

## Personas
Invoke via skill: @Lead, @Dev, @Critic
Definitions: `.claude/skills/`

For single-persona projects, an inline identity statement in agents.md is appropriate:

## Identity
Senior Systems Engineer — Go 1.22, gRPC, high-throughput concurrency.
Favor explicit error handling and composition over inheritance.
Prefer asking over guessing when specs are ambiguous.

Model Routing and Personas

Personas define what work to do and how to scope it. Model Routing is a separate practice that defines which model to use.

Keep them separate. Do not add model profiles to persona definitions — it adds noise and the pairing changes as models evolve. When invoking a persona, select the model manually based on task characteristics:

Persona Type	Typical Work	Recommended Profile
Lead / Architect	System design, specs, ADRs	High Reasoning
Developer / Implementation	Code generation, refactoring	High Throughput
Critic / Reviewer	Constitutional review, security	High Reasoning
Content / Docs	Documentation, KB entries	Massive Context

Industry Implementations

The Effective Delivery AI-driven framework provides a concrete example of mapping the core ASDLC personas to specific phases of the Double Diamond using VS Code and GitHub Copilot. Their workflow defines 6 specific agent personas:

Business Analyst: Extracts and organizes knowledge (maps to early Discover/Define phases).
Architect: Creates solution blueprints (maps to our @Lead persona).
Software Engineer / Frontend Software Engineer: Generates and refactors code (maps to our @Dev persona).
UI Reviewer / Code Reviewer: Verifies implementation against Figma/specs (maps to our @Critic persona).

This kind of specialization is a practical implementation of the persona registry pattern, allowing different agent definitions to be invoked depending on the current phase of the delivery workflow.

AGENTS.md Specification

The definitive guide to the AGENTS.md file, focusing on minimal, high-signal context for AI agents.

Status: Live | Last Updated: 2026-02-18

Definition

AGENTS.md is an open format for guiding coding agents, acting as a “README for agents.” It provides a dedicated, predictable place for the minimal, human-authored context that agents need to work effectively on a project — things that are not already expressed by the repo itself.

We align with the agents.md specification, treating this file as the authoritative source of truth for agentic behavior within the ASDLC.

When to Use

Use this practice when:

Establishing a new repository for AI-assisted development
Onboarding new AI tools (Cursor, Windsurf, Claude Code) to an existing project
You need to standardize agent behavior across a team
AI agents are making judgment calls you want to codify explicitly

Skip this practice when:

The project is a temporary script or throwaway prototype
The project tech stack and structure is very commo or obvious.

Core Philosophy

1. Minimal by Design

Research by Gloaguen et al. (2026) on 138 real-world repositories found that LLM-generated context files reduce agent task success rates while increasing inference cost by over 20%. Developer-written context files provide only a marginal improvement (+4%) — and only when they are minimal and precise. The conclusion is unambiguous: unnecessary requirements in context files actively harm agent performance, not because agents ignore them, but because agents follow them faithfully, broadening exploration and increasing reasoning cost without improving outcomes.

The default stance for agents.md should be: if a constraint can be expressed elsewhere, it must not live here.

2. Toolchain First

If a constraint can be enforced deterministically by a tool already in the repo — a linter, formatter, type checker, hook, or CI gate — it must not be restated in agents.md. The tool is the constraint. Restating it creates maintenance debt, dilutes signal, and burdens the agent with requirements it cannot actually enforce.

The correct pointer is:

Lint: `pnpm lint` (Biome — see `biome.json`)

Not a list of what Biome enforces.

What belongs where:

Type	Example	Home
Toolchain-enforced	no `var`, import order, formatting	biome.json / eslint / tsconfig
Judgment / architectural	prefer composition, ask before adding deps	agents.md
Session-scoped persona	Critic, Builder	skill or workflow file
Task-specific style	API naming for this module	The Spec / PBI

3. Avoiding the Pink Elephant Problem

Agents are highly susceptible to Context Anchoring (the “Pink Elephant Problem”). Telling an LLM what not to do ensures that the concept is front-and-center in its attention mechanism. If your AGENTS.md says “do not use tRPC”, the agent might still reach for it because the token tRPC is highly active in the context window.

For this reason, treat AGENTS.md as a diagnostic tool for codebase friction. Every instruction added to steer the agent away from a mistake is a signal of structural friction. The ideal response is to fix the underlying ambiguity—for example, by actually deleting the legacy utilities or adding a linter rule—and then delete the instruction from the context file.

4. The Context Anchor (Long-Term Memory)

What agents.md does own is persistent judgment — the things that can’t be expressed by a linter or a type checker. Agents are stateless. Without grounding, each session reverts to generic training weights. agents.md carries the project’s institutional judgment for AI collaboration: how to resolve ambiguity, what to ask before acting, which architectural values to uphold.

This is stable, rarely-changing content. If your agents.md changes often, it is probably carrying content that belongs elsewhere.

5. A README for Agents

Just as README.md is for humans, AGENTS.md is for agents. It complements existing documentation by containing the context agents need that is not already discoverable from the repo structure, toolchain config, or existing docs.

6. Context is Code

In the ASDLC, we treat AGENTS.md with the same rigor as production software:

Version Controlled: Tracked via git and PRs
Falsifiable: Contains clear, testable behavioral expectations
Optimized: Structured to maximize signal-to-noise ratio

Tool-Specific Considerations

Different AI coding tools look for different filenames. While AGENTS.md is the emerging standard, some tools require specific naming:

Tool	Expected Filename	Notes
Cursor	`.cursorrules`	Also reads `AGENTS.md`
Windsurf	`.windsurfrules`	Also reads `AGENTS.md`
Claude Code	`CLAUDE.md`	Does not read `AGENTS.md`; case-sensitive
Codex	`AGENTS.md`	Native support
Zed	`.rules`	Priority-based; reads `AGENTS.md` at lower priority
VS Code / Copilot	`AGENTS.md`	Requires `chat.useAgentsMdFile` setting enabled

Zed Priority Order

Zed uses the first matching file from this list:

.rules
.cursorrules
.windsurfrules
.clinerules
.github/copilot-instructions.md
AGENT.md
AGENTS.md
CLAUDE.md
GEMINI.md

VS Code Configuration

VS Code requires explicit opt-in for AGENTS.md support:

Enable chat.useAgentsMdFile setting to use AGENTS.md
Enable chat.useNestedAgentsMdFiles for subfolder-specific instructions

Recommendation

Create a symlink to support Claude Code without duplicating content:

ln -s AGENTS.md CLAUDE.md

Note that Claude Code also supports CLAUDE.local.md for personal preferences that shouldn’t be version-controlled.

Ecosystem Tools

Ruler

Ruler synthesizes agent instructions from multiple sources (AGENTS.md, .cursorrules, project conventions) and injects them into coding assistants that may not natively support the AGENTS.md standard. Useful for teams using multiple coding assistants who want to maintain a single source of truth.

Anatomy

The following sections form the minimal, effective structure for agents.md. Each section should only exist if it carries content that genuinely cannot live elsewhere.

1. Mission (The Project Context)

A concise description of the project’s purpose and constraints. This differentiates domain context the agent cannot infer from code — a “User” in a banking app (ACID compliance, zero-trust) behaves very differently from a “User” in a casual game (low friction). Keep this to 2–4 sentences.

> **Project:** ZenTask — a minimalist productivity app.  
> **Core constraint:** Local-first data architecture; offline support is non-negotiable.

2. Toolchain Registry

The minimal reference to what non-standard tools are in play and how to invoke them. Do not describe what the tools enforce — that is already in their config files.

Intent	Command	Notes
Build	`pnpm build`	Outputs to `dist/`
Test	`pnpm test:unit`	Flags: —watch=false
Lint	`pnpm lint --fix`	Biome — see `biome.json`
Type check	`pnpm typecheck`	tsconfig.json is the authority

3. Judgment Boundaries

The behavioral rules that cannot be expressed by a tool or through a skill — the steering constraints that shape how the agent reasons, not what the linter catches. Use the three-tier system:

NEVER (Hard judgment limits):

Never commit secrets, tokens, or .env files
Never add external dependencies without discussion
Never guess on ambiguous specs — stop and ask

ASK (Human-in-the-loop triggers):

Ask before running database migrations
Ask before deleting files

ALWAYS (Proactive judgment):

Explain your plan before writing code
Handle all errors explicitly — never swallow exceptions silently

Note: If a rule here overlaps with something your toolchain or harness enforces (e.g., skill, linting rules, type errors), remove it from agents.md. The tool is the enforcement mechanism, not the agent.

4. Available Personas (Registry Only)

If your project uses multiple agent personas, list them by name and invocation. Full persona definitions live in skill/workflow files, not inline here. Loading all persona definitions on every session is wasteful when only one is active at a time.

## Personas
Invoke via skill: @Lead, @Dev, @Designer, @Critic  
Definitions: `.claude/skills/`

For single-persona projects, a brief identity statement is sufficient:

## Identity
Senior Systems Engineer — Go 1.22, gRPC, high-throughput concurrency.
Favor explicit error handling and composition over inheritance.

5. Context Map

Use inly when the project structure is complex or the agents constantly stumbles finding files

A structural index of the codebase for architectural orientation. This is most valuable for onboarding new sessions, spec writing, error triage, and ADR authoring — not primarily as a file-navigation aid for delivery tasks. Keep it high-level; agents can discover file-level details themselves.

Scope of value: Gloaguen et al. (2026) found that directory maps in context files do not meaningfully accelerate file discovery during delivery tasks — agents navigate repositories effectively without them. The Context Map’s value is in the broader SDLC: architectural orientation for new sessions, spec writing, error triage, and ADR authoring. It is an orientation tool, not a navigation shortcut for implementation agents. Do not use it as a substitute for good repo structure that speaks for itself.

See Context Mapping for implementation guidance.

What to Audit Out

Periodically review agents.md for content that has migrated to the toolchain. Common offenders:

Style rules that a linter now enforces (remove from agents.md)
Library restrictions that a tsconfig or ESLint rule enforces (remove)
Persona definitions that have been moved to skill files (replace with registry line)
Codebase overviews copied from README (remove — the agent can read README)
LLM-generated sections from /init commands (treat as draft, not final)

On LLM-Generated Context Files

Most agent tools offer a /init or equivalent command that auto-generates an agents.md. Treat this as an example of everything that might not need to be in the constitution. This is the context the agent was able to find out independently.

Gloaguen et al. (2026) found LLM-generated context files consistently reduce agent performance and inflate cost. The mechanism: agents follow the generated instructions faithfully, which broadens exploration and increases reasoning cost without improving task outcomes. The generated file is a useful inventory — use it to identify what might belong in agents.md, then apply the Toolchain First principle to strip everything that belongs elsewhere.

Format Philosophy

The structures in this specification are optimized for larger teams and complex codebases. For smaller projects:

A simple markdown list may suffice
Focus on the concepts (mission, toolchain, boundaries) rather than exact syntax
Iterate on what produces best adherence from your specific model

The goal is signal density, not format compliance. Overly rigid specs create adoption friction.

Reference Template

Filename: AGENTS.md

# AGENTS.md

> **Project:** High-throughput gRPC service for real-time financial transactions.  
> **Core constraints:** Zero-trust security model, ACID compliance on all writes.

## Toolchain
| Action | Command | Authority |
|---|---|---|
| Build | `make build` | Outputs to `./bin` |
| Test | `make test` | Runs with `-race` detector |
| Lint | `golangci-lint run` | See `.golangci.yml` |
| Proto | `make proto` | Regenerates gRPC stubs |

## Judgment Boundaries
**NEVER**
- Commit secrets, tokens, or `.env` files
- Add external dependencies without discussion
- Use `_` to ignore errors

**ASK**
- Before adding external dependencies
- Before running database migrations

**ALWAYS**
- Explain your plan before writing code
- Run `buf lint` after modifying any `.proto` file

## Personas
Invoke via skill: @Lead, @Dev, @Critic  
Definitions: `.claude/skills/`

## Context Map

Map out the project structure. Omit platform-, framework-, tooling-, library-, and framework-specific defaults the Agent can infer from the repository tooling and configuration.

```yaml
monorepo: pnpm workspaces

packages:
  apps/web: Next.js frontend
  apps/api: Express REST API, used by the apps/web and an external mobile app
  packages/ui: shared component library (consumed by web)
  packages/db: Prisma schema, client, migrations — import from here, not direct prisma calls
  packages/types: shared TypeScript types

notable:
  scripts/: repo-wide dev tooling, not shipped
  .env.example: canonical env vars reference, shipped with non-sensitive examples

The key discipline: only list dirs/files that would surprise someone who knows the framework. Standard Next.js folders like src/app are borderline — include them only if your layout deviates from convention.

Constitutional Review Implementation

Step-by-step guide for implementing Constitutional Review to validate code against both Spec and Constitution contracts.

Status: Experimental | Last Updated: 2026-01-08

Definition

Constitutional Review Implementation is the operational practice of configuring and executing Constitutional Review to validate code against both functional requirements (the Spec) and architectural values (the Constitution).

This practice extends Adversarial Code Review by adding constitutional constraints to the Critic Agent’s validation criteria.

When to Use

Use this practice when:

Your project has documented architectural principles in an Agent Constitution
Code passes tests but you’ve experienced architectural violations in production
You want to enforce non-functional requirements (performance, security, data access patterns)
Your team needs to prevent “regression to mediocrity” (LLMs generating internet-average code)

Skip this practice when:

You don’t have an Agent Constitution documented (implement AGENTS.md Specification first)
Your project is a prototype without architectural constraints
The overhead of dual-contract validation exceeds the benefit (very small projects)

Prerequisites

Before implementing Constitutional Review, ensure you have:

Agent Constitution documented (typically AGENTS.md)
The Spec for the feature being reviewed
Critic Agent session separate from the Builder Agent (fresh context)
Architectural constraints clearly defined in the Constitution

If architectural constraints aren’t documented, start with AGENTS.md Specification.

Process

Step 1: Document Architectural Constraints in Constitution

Ensure your Agent Constitution includes non-functional constraints that are:

Specific (not “be performant” but “push filtering to database layer”)
Testable (can be objectively verified)
Scoped (applies to specific categories: Data Access, Performance, Security)

Example Structure:

## Architectural Constraints

### Data Access
- All filtering operations MUST be pushed to the database layer
- Never use `findAll()` or `LoadAll()` followed by in-memory filtering
- Queries must handle 10k+ records without memory issues

### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- No N+1 query patterns

### Security
- User IDs never logged (use hashed identifiers)
- All inputs validated against Zod schemas before processing
- Authentication tokens expire within 24 hours
- No hardcoded secrets (use environment variables)

### Error Handling
- Never fail silently (all errors logged with context)
- User-facing errors never expose stack traces
- Database errors map to generic "Service unavailable" messages

Step 2: Configure Critic Agent Prompt

Extend the standard Adversarial Code Review prompt to include constitutional validation.

System Prompt Template:

You are a rigorous Code Reviewer validating implementation against TWO sources of truth:

1. The Spec (/plans/{feature-name}/spec.md)
   - Functional requirements (what should it do?)
   - API contracts (what are the inputs/outputs?)
   - Data schemas (what is the structure?)

2. The Constitution (AGENTS.md)
   - Architectural patterns (e.g., "push filtering to DB")
   - Performance constraints (e.g., "queries handle 10k+ records")
   - Security rules (e.g., "never log user IDs")
   - Error handling policies (e.g., "never fail silently")

YOUR JOB:
Identify where code satisfies the Spec (functional) but violates the Constitution (architectural).

COMMON CONSTITUTIONAL VIOLATIONS TO CHECK:
- LoadAll().Filter() pattern (data access violation)
- Hardcoded secrets (security violation)
- Missing error logging (error handling violation)
- N+1 query patterns (performance violation)
- User IDs in logs (security violation)

OUTPUT FORMAT:
For each violation:
1. Type: Constitutional Violation - [Category]
2. Location: File path and line number
3. Issue: What constitutional principle is violated
4. Impact: Why this matters at scale (performance, security, maintainability)
5. Remediation Path: Ordered steps to fix (prefer standard patterns, escalate if needed)
6. Test Requirements: What tests would prevent regression

If no violations found, output: PASS - Constitutional Review

Step 3: Execute Constitutional Review Workflow

Follow this sequence to ensure proper validation:

┌─────────────┐
│   Builder   │ → Implements Spec
└──────┬──────┘
       ↓
┌─────────────────┐
│  Quality Gates  │ → Tests, types, linting (deterministic)
└──────┬──────────┘
       ↓ (pass)
┌──────────────────┐
│ Spec Compliance  │ → Does it meet functional requirements?
│     Review       │    (Adversarial Code Review)
└──────┬───────────┘
       ↓ (pass)
┌──────────────────┐
│ Constitutional   │ → Does it follow architectural principles?
│     Review       │    (This practice)
└──────┬───────────┘
       ↓ (pass)
┌─────────────────┐
│ Acceptance Gate │ → Human strategic review (is it the right thing?)
└─────────────────┘

Execution Steps:

Builder completes implementation — Code written, tests pass
Quality Gates pass — Compilation, linting, unit tests all green
Spec Compliance Review — Critic validates functional requirements met
⭐ Constitutional Review — Critic validates architectural principles followed:
- Open new Critic Agent session (fresh context, no Builder bias)
- Provide Constitution (AGENTS.md)
- Provide Spec (feature spec file)
- Provide Code Diff (changed files only)
- Use Constitutional Review prompt (from Step 2)
- Critic outputs violations or PASS
If violations found → Return to Builder with remediation path
If PASS → Proceed to Acceptance Gate (human review)

Step 4: Process Violation Reports

When the Critic identifies constitutional violations, the output will follow this format:

VIOLATION: Constitutional - Data Access Pattern

Location: src/audit/AuditService.cs Line 23

Issue: Loads all records into memory before filtering
Constitution Violation: "All filtering operations MUST be pushed to database layer"

Impact: 
- Works fine with small datasets (< 1k records)
- Breaks at scale (10k+ records cause memory issues)
- Creates N+1 query patterns in related queries
- Violates performance SLA (API responses > 200ms)

Remediation Path:
1. Push filter to database query:
   repository.FindWhere(x => x.Date > startDate)
2. If ORM doesn't support this pattern, use raw SQL:
   SELECT * FROM audit_logs WHERE date > @startDate
3. Add performance test with 10k+ mock records to prevent regression
4. Document the constraint in repository interface comments

Test Requirements:
- Add test: "GetLogs with 10k records completes in < 200ms"
- Add test: "GetLogs does not load entire table into memory"
  (mock repository, verify FindWhere called, not LoadAll)

Processing steps:

Return to Builder Agent with full violation report
Builder implements remediation following the ordered path
Re-run Constitutional Review after fixes
Iterate until PASS (typically 1-2 cycles)

Step 5: Update Constitution Based on Violations

If the Critic struggles to validate or produces unclear violations:

Indicators Constitution needs update:

Critic says “unclear whether this violates principles”
False positives (valid code flagged as violation)
Violations lack specific remediation paths

Action: Refine constitutional constraints:

## Before (vague)
### Performance
- Code should be fast

## After (specific)
### Performance
- API responses < 200ms at p99
- Database queries must use indexes for common filters
- Push filtering to database layer (never LoadAll().Filter())

Examples

Example 1: Data Access Violation

Code Submitted:

async function getActiveUsers() {
  const users = await db.users.findAll();
  return users.filter(u => u.status === 'active');
}

Constitutional Review Output:

VIOLATION: Constitutional - Data Access Pattern

Location: src/users/service.ts Line 12

Issue: Loads all users then filters in-memory
Constitution Violation: "Push filtering to database layer"

Impact: Works with 100 users, breaks at 10k+

Remediation:
1. db.users.findWhere({ status: 'active' })
2. Add test with 10k users to verify performance

Test Requirements:
- Performance test: 10k users returns in < 200ms
- Verify db.users.findWhere called (not findAll)

Example 2: Security Violation

Code Submitted:

logger.info(`User ${userId} logged in from ${ipAddress}`);

Constitutional Review Output:

VIOLATION: Constitutional - Security

Location: src/auth/logger.ts Line 45

Issue: Logs user ID directly
Constitution Violation: "Never log user IDs (use hashed identifiers)"

Impact: GDPR compliance risk, audit log exposure

Remediation:
1. Hash user ID: logger.info(`User ${hashUserId(userId)} logged in...`)
2. Implement hashUserId utility (SHA-256 with salt)
3. Update all logging to use hashed IDs

Test Requirements:
- Verify logs do not contain raw user IDs
- Verify hashed IDs are consistent (same user = same hash)

Implementation Constraints

Requires Clear Constitutional Principles — Vague constraints produce vague critiques. “Be performant” is not actionable. “API responses < 200ms at p99” is.

Not Fully Automated (Yet) — As of January 2026, requires manual orchestration. You must manually:

Start new Critic Agent session
Provide Constitution + Spec + Code Diff
Interpret violation reports

Model Capability Variance — Not all reasoning models perform equally at constitutional review. Recommended:

High Reasoning models for Critic (DeepSeek R1, Gemini 2.0 Flash Thinking, Claude 3.7 Sonnet)
Avoid throughput-optimized models (they skip architectural analysis)

False Positives Possible — Architectural rules have exceptions. The Critic may flag valid code that violates general principles for good reasons. Human review in Acceptance Gate remains essential.

Context Window Limits — Large diffs may exceed context windows. Solutions:

Review changed files only (not entire codebase)
Split large PRs into smaller, focused changes
Use Summary Gates to compress Spec to relevant sections

Troubleshooting

Issue: Critic approves code that violates Constitution

Cause: Constitutional constraints not specific enough in AGENTS.md

Solution:

Review violation that slipped through

Add specific constraint to Constitution:

### Data Access
- ❌ Before: "Queries should be efficient"
- ✅ After: "Never use LoadAll().Filter() - push filtering to database"

Re-run Constitutional Review with updated Constitution

Issue: Critic flags valid code as violation

Cause: Constitutional rule is too strict or lacks exceptions

Solution:

Document exception in Constitution:

### Data Access
- Push filtering to database layer
- Exception: In-memory filtering allowed for cached reference data (< 100 records)

Update Critic prompt to recognize exceptions
Proceed to Acceptance Gate (human validates exception is legitimate)

Issue: Constitutional Review takes too long

Cause: Large code diffs or complex Constitution

Solution:

Break up PRs — Smaller, focused changes
Parallelize reviews — Review multiple files concurrently
Use Summary Gates — Compress Spec to relevant sections only
Cache Constitution — Reuse constitutional context across reviews

Future Automation Potential

This practice is currently manual but has clear automation paths:

CI/CD Integration — Automated constitutional review on PR creation:

# .github/workflows/constitutional-review.yml
on: pull_request
jobs:
  constitutional-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run Constitutional Review
        run: |
          constitutional-review-agent \
            --constitution AGENTS.md \
            --spec plans/${FEATURE}/spec.md \
            --diff ${{ github.event.pull_request.diff_url }}

IDE Integration — Real-time constitutional feedback:

Inline warnings when typing code that violates Constitution
Suggestions appear as you code (like linting)

Living Constitution — Automatic updates:

Track approved exceptions to constitutional rules
Suggest Constitution updates when patterns emerge

Violation Analytics — Dashboard tracking:

Which constitutional principles violated most often
Identify gaps in agent training
Measure constitutional compliance over time

See also:

Constitutional Review — The pattern this practice implements
Adversarial Code Review — The base review pattern
Agent Constitution — Source of architectural truth
The Spec — Source of functional truth
AGENTS.md Specification — How to document the Constitution
Feature Assembly — The full workflow where this practice fits

External Validation

A Method for AI-Assisted Pull Request Reviews (Carlos Lassala, January 2026) — Production implementation showing this practice in action

Context Mapping

The practice of creating high-density Context Maps to guide agents through codebases and documentation.

Status: Experimental | Last Updated: 2026-02-16

Definition

Context Mapping is the tactical practice of generating and maintaining the Context Map within your project’s AGENTS.md. It involves auditing your knowledge assets and creating a compressed index that allows agents to navigate them autonomously.

When to Use

Complex Projects: When your codebase is too large for the context window.
Hidden Logic: When business rules reside in “Internal Docs” (docs/arch) rather than code.
New Architectures: When using bleeding-edge frameworks (e.g., Next.js Canary) where the model’s training data is outdated.

The Strategy: “Hybrid Mapping”

We use a Hybrid Strategy based on the scale of independent variables:

Knowledge Type	Scale	Preferred Format	Why?
Project Structure	< 1000 files	Annotated YAML	Readable, Structural, Native to LLMs.
Internal Docs	< 100 files	Annotated YAML	Explains intent alongside path.
External Frameworks	> 10MB text	Compressed Pipe	Maximum token density (80% compression).

Rule of Thumb

“If you can read it, the Agent can read it.” Default to YAML. Only use Compressed Pipe syntax when you hit token limits with massive external indexes.

Implementation: The YAML Standard

For 99% of use cases, use an Annotated YAML map in your AGENTS.md.

1. Mapping Code (Topology)

Do not list every file. List Responsibilities.

project_structure:
  src:
    features:
      checkout: "Payment flow and cart logic. STRICT: No direct DB access."
      inventory: "Stock checking and reservation logic."
    shared:
      components: "Re-usable UI atoms. MUST use Tailwind."
      lib: "Stateless utilities."

2. Mapping Internal Docs (Knowledge)

Map your docs/ folder to explain what questions each document answers.

documentation_index:
  docs:
    arch:
      infra.md: "READ THIS for Terraform state policies and AWS setup."
      decisions.md: "ADR log. Explains why we chose gRPC over REST."
    api:
      contracts.md: "The single source of truth for API schemas."

Implementation: The Compressed Standard (Advanced)

Use this only for massive indices, such as dumping the entire Next.js or Supabase documentation structure into context.

Format: [Name]|root:path|Instruction|path:{file1,file2}

[Next.js Docs]|root: ./.next-docs|IMPORTANT: Prefer retrieval-led reasoning.
|01-app/01-getting-started:{01-installation.mdx,02-project-structure.mdx}
|01-app/02-building-your-application:{01-routing.mdx,02-rendering.mdx}

Process

Audit: Identify the “Hidden Knowledge” (Architecture docs, specific files) agents miss.
Select Format: Default to YAML. Switch to Pipe only if > 2000 tokens.
Embed: Place the map in Section 5 of your AGENTS.md.
Verify: Ask the agent a question that requires the map (e.g., “Where is the Terraform state policy?”). If it reads the map and then finds the file, you succeeded.

Context Offloading

The practice of moving agent trajectories, tool results, and historical state from the context window to the filesystem to prevent cognitive overload and context rot.

Status: Experimental | Last Updated: 2026-02-21

Definition

Context Offloading is the operational process of moving agent trajectories, intermediate tool results, and session history from the active LLM context window to persistent storage (usually the filesystem).

Instead of relying on an ever-growing linear chat history that inevitably degrades model reasoning (Context Rot), this practice treats the active context as a limited working memory, using the filesystem as long-term storage that the agent can retrieve from only when necessary.

When to Use

Use this practice when:

Building long-running autonomous agents (e.g., Ralph Loop)
Executing tasks that generate massive intermediate tool outputs (e.g., search results, test logs, compiler errors)
Transitioning between distinct phases of a complex workflow (e.g., Planning $\rightarrow$ Execution)

Skip this practice when:

Performing simple, single-shot queries
Working within a deeply specialized sub-agent that requires the full uninterrupted trajectory for its specific micro-task

Process

Step 1: Establish the State Directory

Create a dedicated location for offloaded context, isolated from the project source code.

mkdir -p .agents/state/trajectories

Step 2: Implement Truncation Thresholds

Configure the agent harness to monitor token usage. When the context window approaches a safety threshold (e.g., 80% capacity), trigger a truncation event.

Step 3: Offload and Summarize

When a truncation event is triggered:

Write the raw intermediate results (e.g., full compiler <stdout>) to a file in the state directory.
Replace the massive raw result in the active context window with a pointer and a highly compressed summary.

Example Context Replacement:

[Tool Execution: `pnpm check`]
*Result offloaded to: `.agents/state/logs/typecheck-142.log`*
Summary: 14 type errors found. Primary cluster in `src/components/Form.tsx` related to missing `Zod` inference.

Step 4: Provide Retrieval Tools

The agent must be able to read the offloaded context back into working memory if needed. Provide explicit filesystem access tools (e.g., view_file) or implement progressive disclosure mechanics that allow the agent to fetch the raw logs only when actively debugging them.

Common Mistakes

Over-Summarization (Lossy Compression)

Problem: Summarizing the offloaded context too aggressively, losing critical edge cases that the agent needs later.

Solution: Always preserve the raw data in the file system before summarizing. The summary is an index; the file is the source of truth.

Offloading to Conversation History

Problem: Attempting to offload by passing the context to a “memory API” or relying entirely on an extended context window (e.g., 2M tokens) without structural filtering.

Solution: The filesystem is the only deterministically queryable, grep-able, and auditable storage layer for code agents. Long context windows do not negate the need for structured state; they merely delay the onset of context rot. Build file-centric state management mechanisms instead of assuming the model will remember everything accurately.

This practice implements:

Context Gates — Context Offloading is the operational mechanism behind Input Summary Gates.

See also:

Context Engineering — The underlying conceptual discipline that mandates deterministic context boundaries.

Feature Assembly

The implementation phase where PBIs are executed against Specs, validated through quality gates, and integrated into the codebase.

Status: Draft | Last Updated: 2026-01-09

Definition

Feature Assembly is the implementation phase in the Agentic SDLC where PBIs (Product Backlog Items) are executed by agents or developers using The Spec as the authoritative source of truth. Unlike traditional development where implementation details drift from requirements, Feature Assembly enforces strict contract validation through Context Gates before code enters the codebase.

This is where the “Delta” (PBI) meets the “State” (Spec), and the output is verified code that provably satisfies documented contracts.

The Assembly Pipeline

flowchart LR
  PBI[PBI] --> Spec[Spec]
  Spec --> Code[Implementation]
  Code --> Gates{Gates}
  Gates -->|PASS| Merge[Merge]
  Gates -->|FAIL| Code
  Merge --> Done([Done])

The Problem: Implementation Drift

Traditional development workflows suffer from a disconnect between specification and implementation:

Spec-less Coding — Developers implement features based on vague tickets, Slack discussions, or tribal knowledge, leading to inconsistent interpretations.

Post-Hoc Documentation — Documentation is written after implementation (if at all), capturing what was built rather than what was intended.

Silent Contract Violations — Code that “works” but violates architectural constraints, performance requirements, or edge case handling goes undetected until production.

Agent Hallucination — LLM-generated code drifts toward “average solutions” found in training data, ignoring project-specific constraints.

The Solution: Spec-Driven Assembly

Feature Assembly inverts the traditional workflow:

The Spec is written first — Specs define contracts before any code is written
PBIs reference the Spec — PBIs point to spec sections rather than duplicating requirements
Implementation is validated against contracts — Code must pass quality gates that verify spec compliance
Gates block invalid code — Failed validation prevents merge, forcing correction

This creates a closed loop where the Spec is both the input (what to build) and the acceptance criteria (how to verify).

The Assembly Workflow

Phase 1: Context Loading

The agent or developer begins by loading the necessary context:

Input:

The PBI (defines the delta/change)
The Spec (defines the contracts)
The codebase (current state)

Example:

# PBI-427: Implement notification preferences API
# Context:
#   - Spec: /plans/notifications/spec.md
#   - Scope: src/api/notifications/

Phase 1a: Plan Verification

Before implementation begins, verify the agent’s proposed execution plan.

Modern coding agents generate an internal execution plan before writing code. This plan must be reviewed—blind approval is a common failure mode.

The Vibe Check:

Does the agent’s plan explicitly reference the Spec?
A plan that says “I will implement the feature” without citing spec sections is a red flag

The Logic Check:

Does the agent propose to change State (the Spec) or just Delta (the Code)?
Implementation should only modify code, not silently redefine requirements

The Observability First Rule:

Before implementing complex logic, ensure the plan includes adding logs or traces
The agent should “instrument itself” to prove the code path is reachable before attempting to fix business logic

[!IMPORTANT] If the plan is vague (“I will fix the bug”), reject it. Demand a specific file-level plan before execution proceeds.

Phase 1b: Breakdown & Sequencing

Break the implementation into atomic steps that can be verified independently. This defines the Micro-Commits sequence.

The Atomic Sequence (Standard Order):

Refine Contracts — If the PBI reveals a gap, update the Spec first. Implementation creates learning; capture that learning in the Spec immediately.
Define Types/Schemas — Code the interface (TypeSafe contract).
Test Data — Generate inputs/fixtures matching the schema.
Test Cases — Write tests that use the data and fail (Red).
Implementation — Write logic to make tests pass (Green).
Integration — Wire it into the main app.

Each step is a “save point” (commit). If you find a spec bug in step 5, go back to step 1, fix the spec, and continue. Code and Spec must remain in sync.

Code is generated or written to satisfy the PBI requirements while adhering to spec contracts.

Key Principles:

Read-only Spec — The Spec is not modified during assembly (unless refinement is explicitly required)
Incremental commits — Follow Micro-Commits pattern for rollback safety
Bounded scope — PBI defines explicit file/folder boundaries

Anti-Pattern:

// ❌ Implementing without reading the spec
async function updatePreferences(data: any) {
  await db.save(data); // No validation, ignores spec contracts
}

Correct Pattern:

// ✅ Implementing against spec contracts
// See: /plans/notifications/spec.md#data-schema
import { PreferencesSchema } from './schemas';

async function updatePreferences(data: unknown) {
  // Spec requirement: validate input
  const validated = PreferencesSchema.parse(data);
  
  // Spec requirement: latency < 200ms
  const result = await db.save(validated);
  
  return result;
}

Phase 3: Quality Gates

Before code can be merged, it must pass through a three-tier validation system defined in Context Gates:

Quality Gates (Deterministic - Required)

Automated, binary checks enforced by tooling:

✅ Compilation — TypeScript/compiler passes
✅ Linting — ESLint/code style rules pass
✅ Unit Tests — All tests pass
✅ Type Safety — No any types, Zod schemas validate
✅ Build — Application builds successfully

Implementation:

# .github/workflows/quality-gate.yml
name: Quality Gate
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Compile
        run: npm run build
      - name: Lint
        run: npm run lint
      - name: Test
        run: npm run test
      - name: Type Check
        run: npm run type-check

Review Gates (Probabilistic - Recommended)

LLM-assisted validation using a Critic Agent (see Adversarial Code Review):

📋 Spec Compliance — Does code satisfy all spec requirements?
🚫 Anti-Pattern Detection — Does code violate documented constraints?
🔍 Edge Case Coverage — Are error paths handled?
🔒 Security Review — Any injection vulnerabilities or auth bypasses?

Implementation:

# Run Critic Agent in fresh session
critic-agent review \
  --spec plans/notifications/spec.md \
  --diff src/api/notifications/preferences.ts \
  --output violations.json

Acceptance Gates (Human-in-the-Loop - Required)

Subjective validation requiring human judgment:

🎯 Strategic Fit — Does this solve the actual user problem?
🎨 UX Review — Does the interaction feel right?
📐 Architectural Consistency — Does this align with system design?

Workflow:

Engineering Lead reviews PR
Validates against Spec’s Blueprint section
Approves or requests changes

Phase 4: Integration

Once all gates pass, code is merged and the Spec is updated if contracts changed.

Merge Checklist:

✅ All quality gates pass
✅ Review gate shows PASS (if using adversarial review)
✅ Human acceptance complete
✅ Spec updated if contracts changed
✅ PBI closed/archived

Relationship to Experience Modeling

Experience Modeling defines the Design System that Feature Assembly consumes.

Key Constraint: During Feature Assembly, the Design System is read-only. Feature agents use design system components but cannot modify them.

Example Gate:

// pre-commit hook
const changedFiles = getChangedFiles();
const designSystemFiles = changedFiles.filter(f => 
  f.startsWith('src/design-system/')
);

if (designSystemFiles.length > 0 && !isExperienceModelingMode()) {
  console.error('❌ Design system is read-only during Feature Assembly');
  console.error('   Use [EM] commit tag to override (requires approval)');
  process.exit(1);
}

This prevents “Design Drift” where features gradually corrupt the design system.

Relationship to The Spec

Feature Assembly is the execution of the Spec’s contracts.

Spec Section	Assembly Verification
Blueprint → Architecture	Code structure matches defined patterns
Blueprint → Anti-Patterns	Linting/review catches violations
Contract → Definition of Done	All checklist items satisfied
Contract → Regression Guardrails	Tests verify invariants hold
Contract → Scenarios	E2E tests implement Gherkin scenarios

The Spec is the test oracle — if the Spec says latency must be <200ms, the quality gate verifies it.

Relationship to The PBI

PBIs are the transient execution triggers. Feature Assembly is what happens when a PBI is executed.

Atomicity Rule: 1 PBI = 1 Atomic Feature Assembly.

If a feature requires multiple large steps (e.g., “Create Database Schema” AND “Build API” AND “Build UI”), these should be separate PBIs, not one giant assembly.

PBI-101: Create User Schema & Migrations (Output: Database Ready)
PBI-102: User API Endpoints (Output: API Ready)
PBI-103: User Profile UI (Output: Feature Complete)

This prevents “Giant Commit Syndrome” and keeps context windows manageable.

The Flow:

PBI → Feature Assembly → Quality Gates → Integration → Spec Update (if needed)

Example:

PBI-427 says “Implement preferences API”
Assembly phase builds src/api/preferences.ts
Quality gates verify against /plans/notifications/spec.md
Human accepts strategic fit
Code merges, PBI-427 closes

Implementation Guidelines

Test-First Assembly

Write tests that verify spec contracts before implementing:

// tests/api/preferences.test.ts
// Validates: /plans/notifications/spec.md#contract

describe('Preferences API', () => {
  it('should respond within 200ms', async () => {
    const start = Date.now();
    await updatePreferences(mockData);
    const duration = Date.now() - start;
    expect(duration).toBeLessThan(200);
  });

  it('should reject invalid schema', async () => {
    await expect(
      updatePreferences({ invalid: 'data' })
    ).rejects.toThrow(ValidationError);
  });
});

Spec-Contract Mapping

Link code to spec sections explicitly:

// src/api/notifications/preferences.ts
// Spec: /plans/notifications/spec.md
// Contract: "All updates must validate against PreferencesSchema"
// Contract: "Response time must be <200ms"

export async function updatePreferences(data: unknown) {
  // Implements spec contracts...
}

Continuous Verification

Run quality gates on every commit:

# .git/hooks/pre-commit
#!/bin/bash
npm run lint || exit 1
npm run test || exit 1
npm run type-check || exit 1
echo "✅ Quality gates passed"

Best Practices

1. Never Skip Gates

All code must pass quality gates. No “we’ll fix it later” exceptions.

2. Spec First, Code Second

If the Spec is unclear, update the Spec before implementing. Don’t guess.

3. Atomic Assembly

Complete one PBI fully before starting another. Partial implementations create context pollution.

4. Document Deviations

If implementation requires deviating from the Spec, update the Spec in the same commit with a changelog entry.

5. Automate Gates

Quality gates should run automatically on CI/CD. Manual gates introduce inconsistency.

Anti-Patterns

The “Works on My Machine” Merge

Problem: Code passes local tests but fails in CI/production.

Solution: Require all quality gates to pass in CI before merge is allowed.

The Spec Drift

Problem: Code is implemented without reading the Spec, causing contract violations.

Solution: Code review checklist requires explicit spec section references.

The Post-Hoc Documentation

Problem: Spec is updated after code is written, documenting what was built rather than what was intended.

Solution: Spec reviews happen before PBI creation. No PBI without a Spec.

The Eternal WIP

Problem: PBIs remain “in progress” for weeks, accumulating scope creep.

Solution: Time-box PBIs. If not done in 1-2 days, break into smaller PBIs.

Metrics and Observability

Track assembly health with these metrics:

Gate Pass Rate:

Quality gates: Should be >95%
Review gates: Should be >80%
Acceptance gates: Should be >90%

Cycle Time:

Time from PBI start to merge
Target: 1-2 days for typical features

Spec Coverage:

% of code linked to spec sections
Target: >90%

Rework Rate:

% of PRs requiring changes after review
Target: <20%

Future Enhancements

This practice is currently manual orchestration. Automation opportunities:

Auto-Gate Runners — CI/CD automatically runs critic agents and posts violations as PR comments

Spec-to-Test Generation — LLMs generate test cases from Spec’s Contract section

Real-Time Compliance — IDE plugin shows spec violations as code is written

Assembly Metrics Dashboard — Real-time tracking of gate pass rates and cycle time

Relationship to Micro-Commits

Micro-Commits are the “save points” of Feature Assembly.

One Step = One Commit:

Defined the schema? Commit.
Generated test data? Commit.
Wrote failing test? Commit.
Passed the test? Commit.
Wiring failed? Revert to previous commit.

This granularity enables safety. If the LLM hallucinates during implementation (Step 4), you only roll back Step 4, keeping your schemas and tests.

See also:

The Spec — The authoritative source of truth
The PBI — The execution trigger
Context Gates — The three-tier validation system
Experience Modeling — The design system consumed during assembly
Spec-Driven Development — The overarching methodology
Adversarial Code Review — The review gate pattern
Micro-Commits — The commit practice during assembly

Living Specs

Practical guide to creating and maintaining specs that evolve alongside your codebase.

Status: Experimental | Last Updated: 2025-12-22

Overview

This guide provides practical instructions for implementing the Specs pattern as a spec-anchored methodology.

In the spectrum of Spec-Driven Development, ASDLC explicitly targets the spec-anchored maturity level—where specs remain the architectural source of truth for a feature’s lifecycle, but determinisim is preserved by retaining code. While the pattern describes what specs are and why they matter, this guide focuses on how to create and maintain them.

When to Create a Spec

Create a spec when starting work that involves:

Feature Domains — New functionality that introduces architectural patterns, API contracts, or data models that other parts of the system depend on.

User-Facing Workflows — Features with defined user journeys and acceptance criteria that need preservation for future reference.

Cross-Team Dependencies — Any feature that other teams will integrate with, requiring clear contract definitions.

Don’t create specs for: Simple bug fixes, trivial UI changes, configuration updates, or dependency bumps.

Spec granularity

A spec should be detailed enough to serve as a contract for the feature, but not so detailed that it becomes a maintenance burden.

Some spec features, like gherkin scenarios, are not always necessary if the feature is simple or well-understood.

When to Update a Spec

Update an existing spec when:

API contracts change (new endpoints, modified payloads, deprecated routes)
Data schemas evolve (migrations, new fields, constraint changes)
Quality targets shift (performance, security, accessibility requirements)
Anti-patterns are discovered (during review or post-mortems)
Architecture decisions are made (any ADR should update relevant specs)

Golden Rule: If code behavior changes, the spec MUST be updated in the same commit.

File Structure

Organize specs by feature domain, not by sprint or ticket number.

/project-root
├── ARCHITECTURE.md           # Global system rules
├── plans/                    # Feature-level specs
│   ├── user-authentication/
│   │   └── spec.md
│   ├── payment-processing/
│   │   └── spec.md
│   └── notifications/
│       └── spec.md
└── src/                      # Implementation code

Conventions:

Directory name: kebab-case, matches the feature’s conceptual name
File name: always spec.md
Location: /plans/{feature-domain}/spec.md
Scope: one spec per independently evolvable feature

Context Separation: Maintain a strict separation between global constitutional context and feature-level functional specifications:

ARCHITECTURE.md (Memory Bank): Contains the cross-cutting, global rules, principles, and tech stack choices that apply to every agent session.
/plans/*/spec.md (Functional Spec): Contains the specific behavioral contracts, business logic, and localized architecture for a single domain.

Maintenance Protocol

Same-Commit Rule

If code changes behavior, update the spec in the same commit. Add “Spec updated” to your PR checklist.

git commit -m "feat(notifications): add SMS fallback

- Implements SMS delivery when WebSocket fails
- Updates /plans/notifications/spec.md with new transport layer"

Deprecation Over Deletion

Mark outdated sections as deprecated rather than removing them. This preserves historical context.

### Architecture

**[DEPRECATED 2024-12-01]**
~~WebSocket transport via Socket.io library~~
Replaced by native WebSocket API to reduce bundle size.

**Current:**
Native WebSocket connection via `/api/ws/notifications`

Bidirectional Linking

Link code to specs and specs to code:

// Notification delivery must meet 100ms latency requirement
// See: /plans/notifications/spec.md#contract

### Data Schema
Implemented in `src/types/Notification.ts` using Zod validation.

Template

# Feature: [Feature Name]

## Blueprint

### Context
[Why does this feature exist? What business problem does it solve?]

### Architecture
- **API Contracts:** `[METHOD] /api/v1/[endpoint]` - [Description]
- **Data Models:** See `[file path]`
- **Dependencies:** [What this depends on / what depends on this]

### Anti-Patterns
- [What agents must avoid, with rationale]

## Contract

### Definition of Done
- [ ] [Observable success criterion]

### Regression Guardrails
- [Critical invariant that must never break]

### Scenarios
**Scenario: [Name]**
- Given: [Precondition]
- When: [Action]
- Then: [Expected outcome]

Anti-Patterns

The Stale Spec

Problem: Spec created during planning, never updated as the feature evolves.

Solution: Make spec updates mandatory in Definition of Done. Add PR checklist item.

The Spec in Slack

Problem: Design decisions discussed in chat but never committed to the repo.

Solution: After consensus, immediately update spec.md with a commit linking to the discussion.

The Monolithic Spec

Problem: A single 5000-line spec tries to document the entire application.

Solution: Split into feature-domain specs. Use ARCHITECTURE.md only for global cross-cutting concerns.

The Spec-as-Tutorial

Problem: Spec reads like a beginner’s guide, full of basic programming concepts.

Solution: Assume engineering competence. Document constraints and decisions, not general knowledge.

The Copy-Paste Code

Problem: Spec duplicates large chunks of implementation code.

Solution: Reference canonical sources with file paths. Only include minimal examples to illustrate patterns.

See also:

Specs Pattern — Conceptual foundation
The PBI — Execution units that reference specs

Micro-Commits

Ultra-granular commit practice for agentic workflows, treating version control as reversible save points.

Status: Live | Last Updated: 2026-01-13

Definition

Micro-Commits is the practice of committing code changes at much higher frequency than traditional development workflows. Each discrete task—often a single function, test, or file—receives its own commit.

When working with LLM-generated code, commits become “save points in a game”: Checkpoints that enable instant rollback when probabilistic outputs introduce bugs or architectural drift.

When to Use

Use this practice when:

Working with LLMs to generate code (preventing “vibe convergence”)
Refactoring complex logic where regression risk is high
Conducting experimental “spikes” that might need total rollback
Trying to isolate specific AI changes for audit or debugging

Skip this practice when:

Making trivial documentation fixes (typos)
The work is entirely manual and low-risk

The Problem: Coarse-Grained Commits in Agentic Workflows

Traditional commit practices optimize for human readability and PR review: “logical units of work” that span multiple files and implement complete features.

This fails in agentic workflows because:

LLM outputs are probabilistic — A model might generate correct code for 3 files and introduce subtle bugs in the 4th. Bundling all 4 files into one commit makes rollback destructive.

Regression to mediocrity — Without checkpoints, it’s difficult to identify where LLM output drifted from the Spec contracts.

Context loss — Large commits obscure the sequence of decisions. When debugging, you need to know “what changed, when, and why.”

No emergency exit — If an LLM generates a tangled mess across 10 files, your only option is manual surgery or discarding hours of work.

The Solution: Commit After Every Task

Make a commit immediately after:

Completing a PBI subtask
Generating a single function or module
Making a file pass linting/compilation
Adding one test
Any LLM-assisted edit that produces working code

This creates a breadcrumb trail of working states.

The Practice

4.1. Atomic Tasks → Atomic Commits

Break work into small, testable chunks. Each chunk maps to one commit.

Example PBI: “Add OAuth login flow”

Commit sequence:

1. feat: add OAuth config schema
2. feat: implement token exchange endpoint
3. feat: add session storage for OAuth tokens
4. test: add OAuth flow integration test
5. refactor: extract OAuth error handling

This aligns with atomic PBIs: small, bounded execution units.

4.2. Commit Messages as Execution Log

Commit messages document the sequence of LLM-assisted changes. They serve as:

Context for debugging — “The bug appeared after commit 7.”
Briefing material for AI — Feed recent commits to an LLM to explain current state.
Audit trail — Track architectural decisions embedded in code changes.

Format:

type(scope): brief description

- Detail 1
- Detail 2

Example:

feat(auth): implement OAuth token validation

- Add JWT verification middleware
- Extract claims from token payload
- Return 401 on expired tokens

4.3. Branches and Worktrees for Isolation

Use branches or git worktrees to isolate LLM experiments:

Branches — Separate experimental work from stable code. Merge only after validation.

Worktrees — Run parallel LLM sessions on the same repository without context conflicts. Each worktree is an independent working directory.

Example workflow:

# Create worktree for LLM experiment
git worktree add ../project-experiment experiment-oauth

# Work in worktree, commit frequently
cd ../project-experiment
# ... LLM generates code ...
git commit -m "feat: add OAuth callback handler"

# If successful, merge into main
git checkout main
git merge experiment-oauth

# If failed, discard worktree
git worktree remove ../project-experiment

This prevents contaminating the main branch with failed LLM output.

4.4. Rollback as First-Class Operation

When LLM output introduces bugs:

Identify the bad commit — Review recent history to find where issues appeared.

Rollback to last known good state:

# Soft reset (keeps changes as uncommitted)
git reset --soft HEAD~1

# Hard reset (discards changes entirely)
git reset --hard HEAD~1

Selective revert:

# Revert specific commit without losing subsequent work
git revert <commit-hash>

This is only safe because micro-commits isolate changes.

Claude Code Checkpoints and /rewind

Claude Code’s built-in checkpoint system complements micro-commits with session-level rollback. Before each file edit, Claude Code automatically snapshots the code state. The /rewind command (or double-tap Esc) opens a menu showing each prompt from the session, with three restore options:

Restore code and conversation — revert both to that point
Restore code only — revert files while keeping the conversation context
Restore conversation only — reset Claude’s context while keeping code changes

This is especially useful when an LLM takes a wrong architectural turn — you can rewind the conversation context (preventing the model from reinforcing its own mistakes) while selectively keeping or discarding code changes.

[!WARNING] Key limitation: Checkpoints only track edits made through Claude’s file editing tools. Changes made via bash commands (rm, mv, cp) are not tracked. This is why micro-commits to Git remain essential — checkpoints are “local undo,” Git is “permanent history.”

See: Checkpointing — Claude Code Docs

5. Tidy History for Comprehension

Granular commits create noisy history. Before merging to main, optionally squash related commits into logical units:

# Interactive rebase to squash last 5 commits
git rebase -i HEAD~5

This preserves detailed history during development while creating clean history for long-term maintenance.

Trade-off: Squashing removes granular rollback points. Only squash after validation passes Quality Gates.

Relationship to The PBI

PBIs define what to build. Micro-Commits define how to track progress.

Atomic PBIs (small, bounded tasks) naturally produce micro-commits. Each PBI generates 1-5 commits depending on complexity.

Example mapping:

PBI: “Implement retry logic with exponential backoff”
Commits:
1. feat: add retry wrapper function
2. feat: implement exponential backoff calculation
3. test: add retry logic unit tests
4. docs: update retry behavior in spec

This makes PBI progress traceable and reversible.

See also:

The PBI — Atomic execution units that map to commit sequences
Context Gates — Validation checkpoints that rely on granular commits
Agentic SDLC — The cybernetic loop where micro-commits enable rapid iteration

PBI Authoring

How to write Product Backlog Items that agents can read, execute, and verify—with templates and lifecycle guidance.

Status: Live | Last Updated: 2026-01-13

Definition

PBI Authoring is the practice of writing Product Backlog Items optimized for agent execution. This includes structuring the four-part anatomy, ensuring machine accessibility, and managing the PBI lifecycle from planning through closure.

Following this practice produces PBIs that agents can programmatically access, unambiguously interpret, and verifiably complete.

When to Use

Use this practice when:

Creating work items for agent execution
Planning a sprint with AI-assisted development
Converting legacy user stories to agent-ready format
Setting up a new project’s backlog structure

Skip this practice when:

Work is purely exploratory with no defined outcome
The task is a one-off command (use direct prompting instead)
Human-only execution with no agent involvement

Process

Step 1: Ensure Accessibility

Invisibility is a bug. If an agent cannot read the PBI, the workflow is broken.

A PBI locked inside a web UI without API or MCP integration is useless to an AI developer. The agent must programmatically access the work item without requiring human copy-paste.

Valid access methods:

Method	Description
MCP Integration	Agent connected to Issue Tracker (Linear, Jira, GitHub) via MCP
Repo-Based	PBI exists as a markdown file (e.g., `tasks/PBI-123.md`)
API Access	Tracker exposes REST/GraphQL API the agent can query

If your tracker has no API access: Mirror PBIs as markdown files during sprint planning, or implement MCP integration.

Step 2: Write the Directive

State what to do with explicit scope boundaries. Be imperative, not conversational.

Good:

Implement the API Layer for user notification preferences.
Scope: Only touch the `src/api/notifications` folder.

Bad:

As a user, I want to manage my notification preferences so that I can control what emails I receive.

The second example requires interpretation. The first is executable.

[!TIP] Prompt for the Plan. Even if your tool handles planning automatically, explicitly instruct the agent to output its plan for review. This forces the Specify → Plan → Execute loop.

Example Directive: “Analyze the Spec, propose a step-by-step plan including which files you will touch, and wait for my approval before editing files.”

Step 3: Add Context Pointers

Reference the permanent spec—don’t copy design decisions into the PBI.

Reference: `plans/notifications/spec.md` Part A for the schema.
See the "Architecture" section for endpoint contracts.

Why pointers, not copies: Specs evolve. A copied schema in a PBI becomes stale the moment the spec updates. Pointers ensure the agent always reads the authoritative source.

Step 4: Define Verification Criteria

Link to success criteria in the spec, or define inline checkboxes.

Must pass "Scenario 3: Preference Update" defined in 
`plans/notifications/spec.md` Part B (Contract).

Or inline:

- [ ] POST /preferences returns 201 on valid input
- [ ] Invalid payload returns 400 with error schema
- [ ] Unit test coverage > 80%

Step 5: Declare Dependencies

Explicitly state what blocks this PBI and what it blocks.

## Dependencies
- Blocked by: PBI-101 (creates the base schema)
- Must merge before: PBI-103 (extends this endpoint)

Anti-Pattern: Implicit dependencies discovered at merge time. Identify during planning; either sequence the work or refactor into independent units.

Define what happens when reality diverges from the spec.

If implementation requires changing the Architecture, you MUST 
update `spec.md` in the same PR with a changelog entry.

Options to specify:

Update spec in same PR — Agent has authority to evolve the design
Flag for human review — Agent stops and requests guidance
Proceed with deviation log — Agent continues but documents the gap

Template

# PBI-XXX: [Brief Imperative Title]

## Directive
[What to build/change in 1-2 sentences]

**Scope:**
- [Explicit file/folder boundaries]
- [What NOT to touch]

## Dependencies
- Blocked by: [PBI-YYY if any, or "None"]
- Must merge before: [PBI-ZZZ if any, or "None"]

## Context
Read: `[path/to/spec.md]`
- [Specific section to reference]

## Verification
- [ ] [Criterion 1: Functional requirement]
- [ ] [Criterion 2: Performance/quality requirement]
- [ ] [Criterion 3: Test coverage requirement]

## Refinement Protocol
[What to do if the spec needs updating during implementation]

PBI Lifecycle

Phase	Actor	Action
Planning	Human	Creates PBI with 4-part structure
Assignment	Human/System	PBI assigned to Agent or Developer
Execution	Agent	Reads Spec, implements Delta
Review	Human	Verifies against Spec’s Contract section
Merge	Human/System	Code merged, Spec updated if needed
Closure	System	PBI archived, linked to commit/PR

Common Mistakes

The User Story Hangover

Problem: PBI written as “As a user, I want…” with no explicit scope or verification.

Solution: Rewrite in imperative form with scope boundaries and checkable criteria.

The Spec Copy

Problem: PBI contains copied design decisions that drift from the spec.

Solution: Use pointers to spec sections, never copy content that lives elsewhere.

The Hidden Dependency

Problem: Two PBIs touch the same files; discovered at merge time.

Solution: During planning, map file ownership. If overlap exists, sequence the PBIs or refactor scope.

The Untestable Increment

Problem: PBI verification requires another PBI to complete first.

Solution: Ensure each PBI is self-testable. If not possible, merge into a single PBI or create test fixtures.

This practice implements:

The PBI — The structural pattern this practice executes

See also:

The Spec — The permanent context PBIs reference
Living Specs — How to maintain the specs PBIs point to

Product Vision Authoring

How to create and maintain a Product Vision document that transmits taste to agents—inline in AGENTS.md or as a separate file.

Status: Draft | Last Updated: 2025-01-05

Overview

This practice guides you through creating a Product Vision that prevents vibe convergence—the tendency of agents to produce generic, forgettable outputs. The goal is a document that transmits product taste effectively, whether inline in AGENTS.md or as a separate VISION.md.

Prerequisites

Before authoring a Product Vision, you should have:

A functioning AGENTS.md file
Clarity on your target users (not personas—actual humans)
At least 3-5 opinionated stances on product tradeoffs
Examples of products that “feel right” and products that don’t

Inline vs Separate File

The first decision: does your vision belong in AGENTS.md or a separate file?

When to Inline in AGENTS.md

Choose inline when:

Vision fits in ~200-300 tokens
Vision rarely changes (quarterly or less)
All agents need the same level of vision context
Team is small and vision is well-understood

Inline format:

# AGENTS.md

## Product Vision

We're building a fast, keyboard-first task manager for developers 
who hate project management software. Think Linear meets Raycast.

**We value:** Speed over features. Opinions over options. 
Power users over onboarding wizards.

**We sound like:** Confident, terse, technical. No "Oops!" or "We're excited..."

## Tech Stack
...

This approach keeps vision in the same context load as behavioral rules, ensuring agents always see it.

When to Extract to VISION.md

Extract to a separate file when:

Complexity: Vision has multiple components (voice examples, taste references, detailed heuristics)
Size: Vision exceeds ~500 tokens and crowds out operational context in AGENTS.md
Audience: Different agents need different vision depth (UI agents need full voice guide; infrastructure agents need minimal context)
Maintenance: Vision evolves on a different cadence than technical constraints

Reference format in AGENTS.md:

# AGENTS.md

## Product Vision
See [VISION.md](./VISION.md) for full product identity, voice, and taste references.

**TL;DR:** Fast, opinionated task manager for developers. Linear meets Raycast.

## Tech Stack
...

The TL;DR ensures agents get core identity even when VISION.md isn’t in context.

Writing Each Component

A complete Product Vision has five components. Not all are required for inline versions—scale to your needs.

1. The Actual Humans

Describe real people, not abstract personas.

Bad:

## Target Users
- Power users
- Enterprise customers
- Developer teams

Good:

## Who We're Building For

Overworked creative directors at 15-person agencies who juggle 
12 clients simultaneously. They've used every tool. They're 
impatient with onboarding because they're not beginners. They 
work late, prefer dark interfaces, and will mass-adopt anything 
that saves them 20 minutes a day.

They hate: Enterprise software that treats them like idiots.
They love: Tools that feel like they were built by people like them.

The difference: agents can use the second version to make judgment calls. “Would this person want a wizard?” has a clear answer.

2. Point of View

State opinions that reasonable people might disagree with.

Bad (generic values):

## Values
- User-centric design
- Quality and reliability
- Innovation

Good (actual opinions):

## Our Point of View

- Dense information over progressive disclosure (our users aren't beginners)
- Keyboard-first, mouse-optional
- Dark mode is the default, not a toggle
- We'd rather be slightly weird than completely forgettable
- Features ship incomplete but useful, not complete but late
- Settings are failure; good defaults are success

Each bullet represents a tradeoff. Agents can use these to resolve ambiguity.

3. Taste References

Name specific products and what to take from them.

## Taste References

**Study these:**
- Linear (density, keyboard navigation, visual restraint)
- Raycast (speed as personality, power-user focus)
- Things 3 (calm, opinionated defaults)
- Stripe's API docs (clarity, developer respect)

**Avoid these patterns:**
- Salesforce (cluttered, corporate, permission-drunk)
- Jira (complexity as feature)
- Any product with a "getting started" carousel
- Dashboards with 15 metrics and no hierarchy

Agents can literally reference these: “Make this feel more like Linear, less like Jira.”

4. Voice and Language

Provide actual examples, not just descriptions.

## Voice

Confident but not arrogant. Clear but not sterile.

**We say:**
- "Nope" (not "Unfortunately, that's not possible at this time")
- "This will delete everything. Sure?" (not "Are you sure you want to proceed?")
- "Saved" (not "Your changes have been successfully saved!")

**Error messages are human:**
- "Can't reach the server. Retrying..." (not "Error code 503")
- "That file's too big. Try under 10MB." (not "Upload failed: maximum file size exceeded")

**We don't say:**
- "We're excited to..." (we're software, we don't have feelings)
- "On your journey" (this is a tool, not a spiritual experience)
- "Oops!" (we're adults)

5. Decision Heuristics

Provide tie-breakers for ambiguous situations.

## When In Doubt

1. Fewer features, better defaults
2. If it needs explanation, redesign it
3. Respect power users; don't punish them with beginner safety rails
4. Fast and slightly wrong beats slow and perfect
5. When torn between "conventional" and "opinionated," choose opinionated

Diagnosing Vision Problems

Signs your vision isn’t working:

Symptom	Likely Cause	Fix
Copy “could belong to any product”	Missing or weak Voice section	Add specific examples of tone
UI suggestions feel generic	Missing Taste References	Add “study these / avoid these” products
Agents make wrong tradeoffs	Missing Point of View	Add explicit opinion stances
New team members produce inconsistent work	Vision not in context	Check AGENTS.md references VISION.md
You keep correcting “tone” in reviews	Voice section too abstract	Replace descriptions with examples

Maintenance

Update Triggers

Review and update the vision when:

Major product pivot or repositioning
Target audience shift
Brand refresh
Accumulated drift (specs consistently ignore vision guidance)
New team members report confusion about “what kind of product this is”

Review Cadence

Inline vision: Review when updating AGENTS.md (typically quarterly)
Separate VISION.md: Quarterly review, or when symptoms appear

Ownership

Product Vision should have a single owner (product lead, founder, or design lead). Committee-authored visions lose voice consistency.

Integration with Specs

When writing specs, reference the vision for design rationale:

# Feature: Notification Preferences

## Blueprint

### Context
Users need control over notification frequency without 
feeling like they're configuring a mail server.

### Vision Alignment
- Per VISION.md: "Settings are failure; good defaults are success"
- Ship with smart defaults, surface preferences only when users seek them
- No notification preferences wizard on first launch

This creates traceability: when someone asks “why don’t we have granular notification controls?” the answer is documented.

Template

Inline Template (for AGENTS.md)

## Product Vision

[One paragraph: what we're building and for whom]

**We value:** [3-5 tradeoff stances]

**We sound like:** [Tone description with 2-3 examples]

**Reference products:** [2-3 products that "feel right"]

Full Template (for VISION.md)

# Product Vision: [Product Name]

## Who We're Building For
[Describe actual humans, not personas. Context, constraints, 
what they hate, what they wish existed.]

## Our Point of View
- [Opinion about tradeoff]
- [Opinion about tradeoff]
- [What we value over what]

## Taste References

**These feel right:**
- [Product] — [what specifically]

**These feel wrong:**
- [Pattern] — [why]

## Voice

**We sound like:** [Description]

**We say:** [Examples]

**We don't say:** [Anti-examples]

## When In Doubt
1. [Heuristic]
2. [Heuristic]

See also:

Product Vision — The pattern this practice implements
AGENTS.md Specification — How to structure the constitution file
Agent Personas — Adjusting vision context per agent type
Living Specs — Referencing vision in feature specs

Workflow as Code

Define agentic workflows in deterministic code rather than prompts to ensure reliability, type safety, and testable orchestration.

Status: Experimental | Last Updated: 2026-02-18

Definition

Workflow as Code is the practice of defining agentic workflows using deterministic programming languages (like TypeScript or Python) rather than natural language prompts.

It treats the Agent as a function call within a larger, strongly-typed system, rather than treating the System as a tool available to a chatty agent.

When to Use

Use this practice when:

Building repetitive production processes (CI/CD, release workflows)
Implementing complex branching logic with multiple decision points
Operating high-reliability pipelines where failure consequences are significant
Orchestrating multi-step agent tasks that require verification checkpoints

Skip this practice when:

Exploratory tasks with undefined outcomes
Simple, linear command sequences
Ad-hoc queries or one-off investigations
Low-stakes prototyping where speed matters more than reliability

Why It Matters

When complex workflows are driven entirely by an LLM loop (“Here is a goal, figure it out”), the system suffers from Context Pollution. As the agent accumulates history—observations, tool outputs, internal monologue—its attention degrades.

Nick Tune describes this as the agent becoming “tipsy wobbling from side-to-side”: it loses focus on strict process adherence because its context window is overflowing with implementation details.

Process

Step 1: Identify Deterministic vs Probabilistic Tasks

Audit your workflow. Separate mechanical tasks (running builds, conditional logic, file operations) from intelligence tasks (code review, summarization, decision-making under ambiguity).

Deterministic (Code):

Run build/test commands
Parse structured output
Branch on conditions
Read/write files
Make API calls

Probabilistic (Agent):

Review code against spec
Summarize findings
Generate implementation
Assess quality

Step 2: Define Typed Step Abstraction

Create a common interface for workflow steps:

export type WorkflowContext = {
  workDir: string;
  spec: string;
  history: StepResult[];
};

export type StepResult =
  | { type: 'success'; data: unknown }
  | { type: 'failure'; reason: string; recoverable: boolean };

export type Step = (ctx: WorkflowContext) => Promise<StepResult>;

This enables:

Composition: Reassemble steps into new workflows
Type Safety: Validate data passing between steps
Testability: Unit test orchestration without invoking an LLM

Step 3: Implement the Orchestration Shell

Write the control flow in code. The LLM only appears where intelligence is required:

async function runDevWorkflow(ctx: WorkflowContext) {
  // Deterministic: Run build
  const buildResult = await runBuild(ctx);
  if (buildResult.type === 'failure') {
    return handleBuildError(buildResult);
  }

  // Probabilistic: Agent reviews the diff
  const reviewResult = await runAgentReview({
    diff: await git.getDiff(),
    spec: ctx.spec
  });

  // Deterministic: Act on structured result
  if (reviewResult.verdict === 'PASS') {
    await git.commit();
    await github.createPR();
  }
}

Step 4: Implement Opaque Commands

From the agent’s perspective, workflow steps should be “Black Boxes.” The agent invokes a high-level command and acts on the structured result—it doesn’t need to know implementation details.

Define the interface:

type VerifyWorkResult = {
  status: 'passed' | 'failed';
  errors?: { file: string; line: number; message: string }[];
};

async function verifyWork(ctx: WorkflowContext): Promise<VerifyWorkResult> {
  // Implementation hidden from agent
  const lint = await runLint(ctx.workDir);
  const types = await runTypeCheck(ctx.workDir);
  const tests = await runTests(ctx.workDir);
  
  return aggregateResults([lint, types, tests]);
}

This reduces token usage and prevents the agent from hallucinating incorrect shell commands.

Step 5: Add Enforcement Hooks

Agents will sometimes try to bypass the workflow. Implement hard boundaries using client-side hooks:

# .claude/hooks/pre-tool-use.sh
if [[ "$TOOL" == "Bash" && "$COMMAND" =~ "git push" ]]; then
  echo "Blocked: Use 'submit-pr' tool which runs verification first."
  exit 1
fi

This shifts enforcement from “Instructions in the System Prompt” (which can be ignored) to “Code in the Environment” (which cannot).

Template

Minimal workflow orchestrator structure:

// workflows/dev-workflow.ts
import type { Step, WorkflowContext, StepResult } from './types';

const steps: Step[] = [
  runBuild,
  runLint,
  runAgentReview,  // Only probabilistic step
  commitChanges,
  createPR,
];

export async function execute(ctx: WorkflowContext): Promise<StepResult> {
  for (const step of steps) {
    const result = await step(ctx);
    if (result.type === 'failure' && !result.recoverable) {
      return result;
    }
    ctx.history.push(result);
  }
  return { type: 'success', data: ctx.history };
}

Workflows as Persona Carriers

Persona Injection via Workflow

Workflows are the natural home for session-scoped persona injection. Rather than loading all persona definitions into agents.md on every session, define the persona as part of the workflow context — it gets injected precisely when needed and is absent when it isn’t.

A code review workflow injects the Critic persona. An implementation workflow injects the Dev persona. A spec workflow injects the Lead persona. This is more precise than always-on loading, and avoids the cost of agents following instructions that are irrelevant to the current task.

Example: Review workflow with Critic persona

# .claude/workflows/review.yaml
name: Constitutional Review
trigger: "@review"
context:
  - .claude/skills/critic.md      # Critic persona — injected here, not in agents.md
  - docs/backlog/current-pbi.md   # The spec being reviewed
  - AGENTS.md                     # Project-level judgment boundaries
steps:
  - validate_against_spec
  - constitutional_review
  - produce_report

Example: Implementation workflow with Dev persona

# .claude/workflows/implement.yaml
name: Implementation
trigger: "@implement"
context:
  - .claude/skills/dev.md         # Dev persona — only loaded for implementation tasks
  - docs/backlog/current-pbi.md   # The PBI being implemented
  - AGENTS.md                     # Project-level judgment boundaries
steps:
  - review_pbi
  - plan
  - implement
  - run_tests
  - update_pbi_status

The key property: AGENTS.md contains only project-level judgment. The persona is carried by the workflow and injected at invocation. This keeps agents.md stable and minimal, while delivering the right behavioral context for each task type.

Common Mistakes

The God Prompt

Problem: Entire workflow described in a single system prompt, expecting the agent to “figure it out.”

Solution: Extract deterministic logic into code. The agent should only handle tasks requiring intelligence.

Leaky Abstractions

Problem: Agent sees raw command output (500 lines of test failures) instead of structured results.

Solution: Parse outputs into typed results before passing to the agent. Summarize, don’t dump.

Missing Enforcement

Problem: Workflow relies on the agent “following instructions” without hard boundaries.

Solution: Implement hooks that block unauthorized actions. Trust code, not compliance.

Over-Agentification

Problem: Using an LLM to run npm install or parse JSON—tasks with zero ambiguity.

Solution: Reserve agent calls for genuinely probabilistic tasks. Everything else is code.

Ralph Loop — Implements the “Loop” part of the workflow using code-based persistence
Context Gates — Architectural checkpoints that Workflow as Code enforces programmatically
Model Routing — Assigning different models to different steps within the code-based workflow

ASDLC.io Article Compendium

Table of Contents

Concepts

Patterns

Practices

Concepts (A-Z)

Agent Skills

Agent Skills

Definition

The Anatomy of a Skill

Progressive Disclosure

The ASDLC Perspective

Skills as Persona Carriers

Horizontal vs. Vertical Context

Skills vs. MCP

Agentic SDLC

Definition

The Industrial Thesis

The Cybernetic Model

The Cybernetic Loop

Strategic Pillars

Factory Architecture (Orchestration)

Standardized Parts (Determinism)

Quality Control (Governance)

The Agent Factory (Meta-Optimization)

ASDLC Usage

AI Amplification

Definition

The Mechanism

ASDLC Usage

AI Software Factory

Definition

The Dichotomy: L3 vs L4 Factories

1. The Safe Factory (The ASDLC Model)

2. The Dark Factory (L4 Model)

ASDLC Position & Governance Risks

Architecture Decision Record

Definition

Key Characteristics

Immutability

Lightweight

Decision-Focused

Contextual

Standard Sections

ASDLC Usage

Behavior-Driven Development

Definition

Key Characteristics

From Tests to Specifications of Behavior

The Three Roles in BDD

BDD in the Probabilistic Era

ASDLC Usage

Context Anchoring

Definition

The Pink Elephant Problem

Key Characteristics

The Diagnostic Inversion

ASDLC Usage

Context Engineering

Definition

Key Characteristics

Applications

Screaming Architecture

Toolchain as Context Reduction

Multi-Layer Action Spaces and Economics

The “Learned Context Management” Fallacy

Distinctions

Context vs Guardrails

ASDLC Usage

Coverage Metric

Definition

Key Characteristics

Formula

Why Coverage Matters

Measurement

ASDLC Usage

Digital Twins

Definition

Key Characteristics

ASDLC Usage

ASDLC.io
Article Compendium