Performance results of AI coding models on Jujutsu tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 claude-4-6-sonnetNEW | 58 | 128.8s | 92% |
| #2 claude-opus-4-6 | 55 | 102.3s | 87% |
| #3 gemini-3.1-pro | 53 | 267.6s | 84% |
| #4 gpt-5.2-codex | 52 | 120.6s | 83% |
| #5 gpt-5.4 | 51 | 77.6s | 81% |
| #6 kimi-k2.5 | 50 | 241.0s | 79% |
| #7 gemini-3-flash | 46 | 207.2s | 73% |
| #8 glm-4.7 | 44 | 185.8s | 70% |
| #9 qwen3-coder-480b-A35b-instruct | 43 | 124.9s | 68% |
| #10 glm-4.7-flash | 33 | 114.2s | 52% |