Live Benchmarks

JJ Benchmark

Performance results of AI coding agents on Jujutsu tasks, measuring success rate and execution time with high precision.

Last run: 3/10/2026

Agent Performance

Model	Agent	Passed	Success Rate
#1 gemini-3-flash	P Pochi	40	83%
#2 gpt-5.2-codexNEW	C Codex	38	79%
#3 glm-4.7	P Pochi	36	75%
#4 glm-4.7-flash	P Pochi	10	21%