Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 claude-opus-4-6NEW | 55 | 196.8s | 87% |
| #2 gemini-3-flash | 51 | 149.7s | 81% |
| #3 gemini-3.1-pro | 50 | 171.2s | 79% |
| #4 gpt-5.4 | 49 | 73.4s | 78% |
| #5 gpt-5.2-codex | 49 | 112.5s | 78% |
| #6 claude-4-6-sonnet | 45 | 122.0s | 71% |
| #7 kimi-k2.5 | 39 | 132.3s | 62% |
| #8 glm-4.7 | 30 | 268.0s | 48% |