Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 gemini-3.1-proNEW | 51 | 223.6s | 98% |
| #2 gpt-5.4 | 49 | 114.5s | 94% |
| #3 claude-4-7-opus | 49 | 118.3s | 94% |
| #4 gemini-3-flash | 49 | 303.9s | 94% |
| #5 claude-4-6-sonnet | 48 | 176.8s | 92% |
| #6 gpt-5.2-codex | 47 | 129.9s | 90% |
| #7 kimi-k2.5 | 46 | 110.0s | 88% |
| #8 glm-4.7 | 45 | 189.4s | 87% |