# Agent Benchmarking

Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.

## Manifest

```json
{
  "name": "Agent Benchmarking",
  "description": "Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.",
  "source_url": "https://github.com/affaan-m/everything-claude-code/tree/main/skills/agent-eval",
  "source_pin": null,
  "manifest_hash": "4b623ffac222cb21",
  "risk_tier": "low"
}
```

## SBOM

```json
null
```

