Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

73.0% 46/63 of 63 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

63.6 / 100
Cost $4.05
Duration 2h 55m
Filter:
Per-task results for run 16e667f8-6a0b-469c-a8bb-11022eb4ce53
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy1100.0 / 1007/7OK2m 35s
CG-AL-E002easy1100.0 / 1006/6OK2m 23s
CG-AL-E003easy1100.0 / 1005/5OK2m 36s
CG-AL-E004easy1100.0 / 1006/6OK2m 36s
CG-AL-E005easy2100.0 / 10013/13OK2m 41s
CG-AL-E006easy1100.0 / 1007/7OK2m 46s
CG-AL-E007easy1100.0 / 1007/7OK2m 48s
CG-AL-E008easy1100.0 / 1006/6OK2m 53s
CG-AL-E009easy1100.0 / 1005/5OK2m 51s
CG-AL-E010easy1100.0 / 1005/5OK2m 56s
CG-AL-E031easy1100.0 / 1003/3OK3m 5s
CG-AL-E032easy1100.0 / 1001/1OK3m 2s
CG-AL-E045easy1100.0 / 1004/4OK2m 32s
CG-AL-E050easy2100.0 / 10013/13OK2m 28s
CG-AL-E051easy1100.0 / 10015/15OK2m 34s
CG-AL-E052easy2100.0 / 10016/16OK2m 28s
CG-AL-E053easy1100.0 / 1003/3OK2m 31s
CG-AL-E054easy2100.0 / 1009/9OK2m 30s
CG-AL-E055easy1100.0 / 1008/8OK2m 30s
CG-AL-H001hard1100.0 / 10025/25OK2m 31s
CG-AL-H002hard1100.0 / 1004/4OK2m 35s
CG-AL-H003hard262.5 / 1004/5OK2m 31s
CG-AL-H004hard1100.0 / 10014/14OK2m 32s
CG-AL-H005hard262.5 / 1002/5OK2m 34s
CG-AL-H006hard1100.0 / 1006/6OK2m 28s
CG-AL-H007hard1100.0 / 10010/10OK2m 33s
CG-AL-H008hard1100.0 / 10010/10OK2m 31s
CG-AL-H009hard1100.0 / 10011/11OK2m 35s
CG-AL-H010hard1100.0 / 1008/8OK2m 30s
CG-AL-H011hard20.0 / 1000/0FAIL12.7s
CG-AL-H013hard1100.0 / 1009/9OK2m 31s
CG-AL-H014hard20.0 / 1000/0FAIL14.4s
CG-AL-H015hard1100.0 / 1004/4OK2m 33s
CG-AL-H016hard20.0 / 1000/0FAIL13.2s
CG-AL-H017hard20.0 / 1000/0FAIL12.7s
CG-AL-H018hard1100.0 / 1006/6OK2m 31s
CG-AL-H019hard1100.0 / 1005/5OK2m 27s
CG-AL-H020hard2100.0 / 10010/10OK2m 32s
CG-AL-H021hard262.5 / 1006/20OK2m 33s
CG-AL-H022hard1100.0 / 10021/21OK2m 30s
CG-AL-H023hard20.0 / 1000/0FAIL2m 14s
CG-AL-H025hard1100.0 / 1007/7OK2m 29s
CG-AL-H026hard20.0 / 1000/0FAIL15.5s
CG-AL-H205hard1100.0 / 1006/6OK2m 33s
CG-AL-M001medium262.5 / 1004/10OK2m 34s
CG-AL-M002medium1100.0 / 10022/22OK2m 34s
CG-AL-M003medium262.5 / 1008/9OK2m 31s
CG-AL-M004medium1100.0 / 10012/12OK2m 48s
CG-AL-M005medium262.5 / 10012/22OK2m 44s
CG-AL-M006medium2100.0 / 10018/18OK2m 39s
CG-AL-M007medium20.0 / 1000/0FAIL2m 26s
CG-AL-M008medium20.0 / 1000/0FAIL2m 17s
CG-AL-M009medium1100.0 / 10011/11OK2m 47s
CG-AL-M010medium262.5 / 10019/21OK3m 5s
CG-AL-M020medium2100.0 / 10017/17OK2m 34s
CG-AL-M021medium1100.0 / 10014/14OK2m 50s
CG-AL-M022medium1100.0 / 1009/9OK2m 34s
CG-AL-M023medium1100.0 / 10011/11OK2m 30s
CG-AL-M024medium1100.0 / 10010/10OK2m 37s
CG-AL-M025medium1100.0 / 1007/7OK2m 25s
CG-AL-M026medium20.0 / 1000/0FAIL14.6s
CG-AL-M088medium1100.0 / 1006/6OK2m 29s
CG-AL-M112medium262.5 / 1002/4OK2m 27s