Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

77.3% 85/110 of 110 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

60.1 / 100
Cost $0.89
Duration 4h 51m
Filter:
Per-task results for run c3bb2ccf-7036-4cca-a135-aa211a08c9f4
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy1100.0 / 1007/7OK18.1s
CG-AL-E002easy1100.0 / 1006/6OK2m 47s
CG-AL-E003easy1100.0 / 1005/5OK33.7s
CG-AL-E004easy1100.0 / 1006/6OK39.8s
CG-AL-E005easy1100.0 / 10013/13OK50.0s
CG-AL-E006easy2100.0 / 1007/7OK3m 34s
CG-AL-E007easy1100.0 / 1007/7OK1m 17s
CG-AL-E008easy1100.0 / 1006/6OK1m 45s
CG-AL-E009easy1100.0 / 1005/5OK1m 49s
CG-AL-E010easy1100.0 / 1005/5OK1m 54s
CG-AL-E031easy1100.0 / 1003/3OK1m 57s
CG-AL-E032easy1100.0 / 1001/1OK1m 59s
CG-AL-E045easy1100.0 / 1004/4OK21.4s
CG-AL-E050easy20.0 / 1000/0FAIL17.4s
CG-AL-E051easy1100.0 / 10015/15OK31.1s
CG-AL-E052easy20.0 / 1000/0FAIL7m 45s
CG-AL-E053easy1100.0 / 1003/3OK2m 41s
CG-AL-E054easy1100.0 / 1009/9OK30.7s
CG-AL-E055easy1100.0 / 1008/8OK29.8s
CG-AL-E056easy2100.0 / 10015/15OK5m 26s
CG-AL-E057easy20.0 / 1000/0FAIL11m 42s
CG-AL-E058easy20.0 / 1000/0FAIL5m 5s
CG-AL-H001hard1100.0 / 10025/25OK9m 47s
CG-AL-H002hard1100.0 / 1004/4OK48.5s
CG-AL-H003hard1100.0 / 1005/5OK9m 29s
CG-AL-H004hard20.0 / 1000/0FAIL5m 16s
CG-AL-H005hard1100.0 / 1006/6OK8m 22s
CG-AL-H006hard1100.0 / 1006/6OK1m 9s
CG-AL-H007hard2100.0 / 10010/10OK5m 25s
CG-AL-H008hard1100.0 / 10010/10OK10m 57s
CG-AL-H009hard20.0 / 1000/0FAIL8m 24s
CG-AL-H010hard2100.0 / 1008/8OK34.5s
CG-AL-H011hard20.0 / 1000/0FAIL21.4s
CG-AL-H013hard20.0 / 1000/0FAIL11.9s
CG-AL-H014hard20.0 / 1000/0FAIL399ms
CG-AL-H015hard1100.0 / 1004/4OK18.9s
CG-AL-H016hard1100.0 / 1004/4OK24.8s
CG-AL-H017hard2100.0 / 1003/3OK23.1s
CG-AL-H018hard262.5 / 1004/6OK1m 40s
CG-AL-H019hard2100.0 / 1005/5OK16.2s
CG-AL-H020hard2100.0 / 10010/10OK33.6s
CG-AL-H021hard2100.0 / 10020/20OK42.0s
CG-AL-H022hard1100.0 / 10021/21OK25.8s
CG-AL-H023hard2100.0 / 10025/25OK1m 20s
CG-AL-H024hard10.0 / 1000/0FAIL0ms
CG-AL-H025hard1100.0 / 1007/7OK1m 0s
CG-AL-H026hard1100.0 / 1008/8OK19.1s
CG-AL-H027hard2100.0 / 1004/4OK30.1s
CG-AL-H028hard1100.0 / 1009/9OK22.5s
CG-AL-H029hard2100.0 / 10018/18OK40.5s
CG-AL-H030hard1100.0 / 10010/10OK33.0s
CG-AL-H031hard1100.0 / 10022/22OK33.2s
CG-AL-H032hard2100.0 / 10019/19OK28.3s
CG-AL-H033hard1100.0 / 1005/5OK2m 44s
CG-AL-H034hard2100.0 / 1003/3OK24.0s
CG-AL-H035hard1100.0 / 1003/3OK17.3s
CG-AL-H036hard2100.0 / 1005/5OK27.7s
CG-AL-H037hard20.0 / 1000/0FAIL17.5s
CG-AL-H038hard1100.0 / 1003/3OK20.7s
CG-AL-H039hard1100.0 / 1004/4OK18.2s
CG-AL-H040hard1100.0 / 1002/2OK18.2s
CG-AL-H041hard1100.0 / 1003/3OK25.0s
CG-AL-H042hard2100.0 / 1003/3OK15.5s
CG-AL-H043hard1100.0 / 1005/5OK15.9s
CG-AL-H050hard262.5 / 1002/3OK18.5s
CG-AL-H051hard1100.0 / 1004/4OK53.8s
CG-AL-H052hard1100.0 / 1005/5OK39.4s
CG-AL-H053hard1100.0 / 1004/4OK17.0s
CG-AL-H054hard20.0 / 1000/0FAIL16.6s
CG-AL-H056hard20.0 / 1000/0FAIL20.1s
CG-AL-H057hard1100.0 / 1004/4OK7m 44s
CG-AL-H058hard1100.0 / 1005/5OK35.4s
CG-AL-H205hard1100.0 / 1006/6OK19.9s
CG-AL-M001medium20.0 / 1000/0FAIL12.3s
CG-AL-M002medium20.0 / 1000/0FAIL51.8s
CG-AL-M003medium262.5 / 1003/9OK21.3s
CG-AL-M004medium1100.0 / 10012/12OK2m 51s
CG-AL-M005medium1100.0 / 10022/22OK30.1s
CG-AL-M006medium2100.0 / 10018/18OK1m 53s
CG-AL-M007medium2100.0 / 10015/15OK58.4s
CG-AL-M008medium262.5 / 1007/12OK1m 4s
CG-AL-M009medium2100.0 / 10011/11OK27.7s
CG-AL-M010medium1100.0 / 10021/21OK2m 39s
CG-AL-M020medium2100.0 / 10017/17OK18.9s
CG-AL-M021medium1100.0 / 10011/11OK21.1s
CG-AL-M022medium1100.0 / 1009/9OK21.4s
CG-AL-M023medium262.5 / 10010/11OK5m 30s
CG-AL-M024medium1100.0 / 10010/10OK23.2s
CG-AL-M025medium1100.0 / 1007/7OK19.2s
CG-AL-M026medium2100.0 / 1008/8OK27.5s
CG-AL-M027medium2100.0 / 10018/18OK55.6s
CG-AL-M028medium20.0 / 1000/0FAIL47.8s
CG-AL-M029medium2100.0 / 1003/3OK3m 8s
CG-AL-M031medium20.0 / 1000/0FAIL8m 41s
CG-AL-M032medium1100.0 / 1003/3OK8m 26s
CG-AL-M033medium1100.0 / 1002/2OK6m 17s
CG-AL-M034medium262.5 / 1001/2OK21.9s
CG-AL-M035medium1100.0 / 1002/2OK15.8s
CG-AL-M036medium2100.0 / 1002/2OK7m 37s
CG-AL-M037medium20.0 / 1000/0FAIL8m 32s
CG-AL-M038medium1100.0 / 1002/2OK7m 12s
CG-AL-M039medium1100.0 / 1002/2OK9m 48s
CG-AL-M040medium20.0 / 1000/0FAIL1m 2s
CG-AL-M041medium1100.0 / 1003/3OK1m 46s
CG-AL-M042medium1100.0 / 1008/8OK1m 40s
CG-AL-M043medium1100.0 / 1005/5OK1m 1s
CG-AL-M044medium1100.0 / 1006/6OK2m 46s
CG-AL-M045medium1100.0 / 1006/6OK42.3s
CG-AL-M088medium1100.0 / 1006/6OK18.2s
CG-AL-M112medium1100.0 / 1004/4OK23.5s