Live Benchmarks

JJ Benchmark

Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.

View on GitHub
Last run: 3/12/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
claude-4-6-sonnetNEW
58128.8s
92%
#2
claude-opus-4-6
55102.3s
87%
#3
gemini-3.1-pro
53267.6s
84%
#4
gpt-5.2-codex
52120.6s
83%
#5
gpt-5.4
5177.6s
81%
#6
kimi-k2.5
50241.0s
79%
#7
gemini-3-flash
46207.2s
73%
#8
glm-4.7
44185.8s
70%
#9
qwen3-coder-480b-A35b-instruct
43124.9s
68%
#10
glm-4.7-flash
33114.2s
52%