Agent Benchmarking

Elevated · Review

Head-to-head coding agent comparison tool: YAML task definitions with judge criteria, git worktree isolation per agent run, pass rate/cost/time/consistency metrics, and reproducible benchmarking across Claude Code, Aider, Codex, and other agents.

Governance Receipt

Signer
sovereign-claw-ed25519
Signed At
6/4/2026
Risk Tier
T2
Receipt Hash
cee94f86
Manifest Hash
9bb29a8b037ffd9dd19de60289d7292858c4c5cc8688632eaa09e9be2e9c7d25
Signature
s/aDylbU
Root Public Key
349b0348

Skill Details

Gate Verdict
Elevated · Review
Publication State
published
Risk Tier
T2
Manifest Hash
9bb29a8b

More Skills