Live Benchmarks

JJ Benchmark

Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.

Last run: 3/20/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 claude-opus-4-6NEW	55	196.8s	87%
#2 gemini-3-flash	51	149.7s	81%
#3 gemini-3.1-pro	50	171.2s	79%
#4 gpt-5.4	49	73.4s	78%
#5 gpt-5.2-codex	49	112.5s	78%
#6 claude-4-6-sonnet	45	122.0s	71%
#7 kimi-k2.5	39	132.3s	62%
#8 glm-4.7	30	268.0s	48%