# Agent Benchmarking

Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.

## Manifest

```json
{
  "name": "Agent Benchmarking",
  "description": "Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.",
  "source_url": "https://github.com/affaan-m/everything-claude-code/tree/main/skills/agent-eval",
  "source_pin": null,
  "manifest_hash": "9bb29a8b037ffd9dd19de60289d7292858c4c5cc8688632eaa09e9be2e9c7d25",
  "risk_tier": "T2"
}
```

## SBOM

```json
null
```