Live Benchmarks

JJ Benchmark

Performance results of AI coding agents on Jujutsu tasks, measuring success rate and execution time with high precision.

View on GitHub
Last run: 3/10/2026

Agent Performance

ModelAgentPassedSuccess Rate
#1
gemini-3-flash
P
Pochi
40
83%
#2
gpt-5.2-codexNEW
C
Codex
38
79%
#3
glm-4.7
P
Pochi
36
75%
#4
glm-4.7-flash
P
Pochi
10
21%