Coverage Metric

Description
Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.
Status
Draft
Last Updated
Tags
Metrics, Agent Evaluation, Long-Horizon Tasks

Definition

Coverage measures task completion reliability: the proportion of required work units an agent successfully completes in a long-horizon task.

Unlike quality metrics (correctness, style, performance), coverage answers: “Did it finish the job?”

Key Characteristics

Formula

Coverage = (Completed Units / Total Required Units) × 100

Where “units” are task-appropriate:

  • Literature review: papers processed
  • Code migration: files converted
  • Test generation: functions covered
  • Data processing: records handled

Why Coverage Matters

Quality metrics assume the agent attempted the work. But long-horizon agents often fail silently:

  • Early termination without completing all items
  • Skipping items without acknowledgment
  • Producing empty or metadata-only outputs

Coverage catches these failures that quality metrics miss.

Measurement

Report three values across multiple runs:

  • Max — Best-case completion
  • Min — Worst-case completion
  • Avg — Expected completion

High variance (large gap between max and min) indicates unreliable architecture, even if max is perfect.

ASDLC Usage

Coverage is particularly relevant for:

  • Batch PBI execution — Did the agent complete all subtasks?
  • Migration tasks — Were all files processed?
  • Review Gates — Did the Critic review all flagged items?

Consider adding coverage assertions to Quality Gates for batch operations.

References

  1. Chenglin Yu, Yuchen Wang, Songmiao Wang, Hongxia Yang, Ming Li (2026). InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents . Accessed January 10, 2026.

    Introduces coverage as primary metric for long-horizon agent evaluation.