Live Benchmarks

JJ Benchmark

Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 52Last run: 4/22/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
gemini-3.1-proNEW
51223.6s
98%
#2
gpt-5.4
49114.5s
94%
#3
claude-4-7-opus
49118.3s
94%
#4
gemini-3-flash
49303.9s
94%
#5
claude-4-6-sonnet
48176.8s
92%
#6
gpt-5.2-codex
47129.9s
90%
#7
kimi-k2.5
46110.0s
88%
#8
glm-4.7
45189.4s
87%