Agent Benchmarking

Unverified

Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.

Skill Details

Gate Verdict
Unverified
Publication State
published
Risk Tier
low
Manifest Hash
4b623ffa

More Skills