Live Benchmarks

JJ Benchmark

Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.

Last run: 3/12/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 claude-4-6-sonnetNEW	58	128.8s	92%
#2 claude-opus-4-6	55	102.3s	87%
#3 gemini-3.1-pro	53	267.6s	84%
#4 gpt-5.2-codex	52	120.6s	83%
#5 gpt-5.4	51	77.6s	81%
#6 kimi-k2.5	50	241.0s	79%
#7 gemini-3-flash	46	207.2s	73%
#8 glm-4.7	44	185.8s	70%
#9 qwen3-coder-480b-A35b-instruct	43	124.9s	68%
#10 glm-4.7-flash	33	114.2s	52%