Live Benchmarks

JJ Benchmark

Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.

View on GitHub
Last run: 3/20/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
claude-opus-4-6NEW
55196.8s
87%
#2
gemini-3-flash
51149.7s
81%
#3
gemini-3.1-pro
50171.2s
79%
#4
gpt-5.4
4973.4s
78%
#5
gpt-5.2-codex
49112.5s
78%
#6
claude-4-6-sonnet
45122.0s
71%
#7
kimi-k2.5
39132.3s
62%
#8
glm-4.7
30268.0s
48%