Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

65.5% 72/110 of 110 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

47.0 / 100
Cost $1.16
Duration 1h 44m
Filter:
Per-task results for run c574b74f-cca1-4101-8d57-6c03fe99b514
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy1100.0 / 1007/7OK28.4s
CG-AL-E002easy1100.0 / 1006/6OK3m 3s
CG-AL-E003easy1100.0 / 1005/5OK46.9s
CG-AL-E004easy1100.0 / 1006/6OK37.9s
CG-AL-E005easy1100.0 / 10013/13OK37.5s
CG-AL-E006easy2100.0 / 1007/7OK2m 49s
CG-AL-E007easy1100.0 / 1007/7OK44.4s
CG-AL-E008easy1100.0 / 1006/6OK48.0s
CG-AL-E009easy1100.0 / 1005/5OK46.2s
CG-AL-E010easy1100.0 / 1005/5OK52.0s
CG-AL-E031easy1100.0 / 1003/3OK49.6s
CG-AL-E032easy1100.0 / 1001/1OK45.1s
CG-AL-E045easy1100.0 / 1004/4OK25.1s
CG-AL-E050easy20.0 / 1000/0FAIL38.7s
CG-AL-E051easy20.0 / 1000/0FAIL55.0s
CG-AL-E052easy2100.0 / 10016/16OK28.9s
CG-AL-E053easy1100.0 / 1003/3OK2m 29s
CG-AL-E054easy20.0 / 1000/0FAIL9.7s
CG-AL-E055easy1100.0 / 1008/8OK25.0s
CG-AL-E056easy2100.0 / 10015/15OK23.6s
CG-AL-E057easy20.0 / 1000/0FAIL35.5s
CG-AL-E058easy20.0 / 1000/0FAIL36.5s
CG-AL-H001hard1100.0 / 10025/25OK20.9s
CG-AL-H002hard1100.0 / 1004/4OK21.3s
CG-AL-H003hard1100.0 / 1005/5OK25.1s
CG-AL-H004hard2100.0 / 10014/14OK20.5s
CG-AL-H005hard2100.0 / 1006/6OK22.9s
CG-AL-H006hard1100.0 / 1006/6OK19.2s
CG-AL-H007hard1100.0 / 10010/10OK22.8s
CG-AL-H008hard1100.0 / 10010/10OK21.1s
CG-AL-H009hard2100.0 / 10011/11OK27.3s
CG-AL-H010hard20.0 / 1000/0FAIL54.0s
CG-AL-H011hard2100.0 / 1005/5OK23.0s
CG-AL-H013hard1100.0 / 1009/9OK19.9s
CG-AL-H014hard20.0 / 1000/0FAIL13.8s
CG-AL-H015hard1100.0 / 1004/4OK18.5s
CG-AL-H016hard1100.0 / 1004/4OK19.0s
CG-AL-H017hard2100.0 / 1003/3OK29.4s
CG-AL-H018hard1100.0 / 1006/6OK21.9s
CG-AL-H019hard2100.0 / 1005/5OK17.4s
CG-AL-H020hard20.0 / 1000/0FAIL1m 16s
CG-AL-H021hard2100.0 / 10020/20OK33.6s
CG-AL-H022hard20.0 / 1000/0FAIL1m 11s
CG-AL-H023hard20.0 / 1000/0FAIL1m 10s
CG-AL-H024hard1100.0 / 1009/9OK26.9s
CG-AL-H025hard1100.0 / 1007/7OK32.5s
CG-AL-H026hard1100.0 / 1008/8OK21.8s
CG-AL-H027hard20.0 / 1000/0FAIL18.2s
CG-AL-H028hard1100.0 / 1009/9OK24.4s
CG-AL-H029hard20.0 / 1000/0FAIL1m 13s
CG-AL-H030hard20.0 / 1000/0FAIL1m 14s
CG-AL-H031hard20.0 / 1000/0FAIL1m 5s
CG-AL-H032hard1100.0 / 10019/19OK38.9s
CG-AL-H033hard20.0 / 1000/0FAIL51.8s
CG-AL-H034hard2100.0 / 1003/3OK28.6s
CG-AL-H035hard1100.0 / 1003/3OK19.8s
CG-AL-H036hard2100.0 / 1005/5OK28.5s
CG-AL-H037hard1100.0 / 1003/3OK22.4s
CG-AL-H038hard1100.0 / 1003/3OK23.4s
CG-AL-H039hard1100.0 / 1004/4OK19.3s
CG-AL-H040hard1100.0 / 1002/2OK19.1s
CG-AL-H041hard2100.0 / 1003/3OK24.2s
CG-AL-H042hard2100.0 / 1003/3OK17.9s
CG-AL-H043hard1100.0 / 1005/5OK16.9s
CG-AL-H050hard262.5 / 1002/3OK18.8s
CG-AL-H051hard20.0 / 1000/0FAIL39.3s
CG-AL-H052hard1100.0 / 1005/5OK41.8s
CG-AL-H053hard1100.0 / 1004/4OK34.9s
CG-AL-H054hard20.0 / 1000/0FAIL13.2s
CG-AL-H056hard20.0 / 1000/0FAIL9.1s
CG-AL-H057hard1100.0 / 1004/4OK2m 23s
CG-AL-H058hard2100.0 / 1005/5OK22.2s
CG-AL-H205hard1100.0 / 1006/6OK20.4s
CG-AL-M001medium20.0 / 1000/0FAIL11.6s
CG-AL-M002medium20.0 / 1000/0FAIL1m 24s
CG-AL-M003medium20.0 / 1000/0FAIL1m 25s
CG-AL-M004medium20.0 / 1000/0FAIL39.5s
CG-AL-M005medium20.0 / 1000/0FAIL1m 18s
CG-AL-M006medium20.0 / 1000/0FAIL1m 23s
CG-AL-M007medium20.0 / 1000/0FAIL1m 31s
CG-AL-M008medium20.0 / 1000/0FAIL1m 23s
CG-AL-M009medium20.0 / 1000/0FAIL39.0s
CG-AL-M010medium20.0 / 1000/0FAIL1m 20s
CG-AL-M020medium2100.0 / 10017/17OK27.0s
CG-AL-M021medium1100.0 / 10011/11OK22.4s
CG-AL-M022medium1100.0 / 1009/9OK27.2s
CG-AL-M023medium20.0 / 1000/0FAIL1m 34s
CG-AL-M024medium1100.0 / 10010/10OK26.6s
CG-AL-M025medium1100.0 / 1007/7OK21.1s
CG-AL-M026medium1100.0 / 1008/8OK28.0s
CG-AL-M027medium20.0 / 1000/0FAIL1m 1s
CG-AL-M028medium20.0 / 1000/0FAIL18.8s
CG-AL-M029medium20.0 / 1000/0FAIL13.9s
CG-AL-M031medium20.0 / 1000/0FAIL1m 3s
CG-AL-M032medium1100.0 / 1003/3OK20.9s
CG-AL-M033medium1100.0 / 1002/2OK23.4s
CG-AL-M034medium20.0 / 1000/0FAIL1m 31s
CG-AL-M035medium1100.0 / 1002/2OK16.0s
CG-AL-M036medium20.0 / 1000/0FAIL53.7s
CG-AL-M037medium20.0 / 1000/0FAIL1m 10s
CG-AL-M038medium1100.0 / 1002/2OK21.9s
CG-AL-M039medium1100.0 / 1002/2OK2m 40s
CG-AL-M040medium20.0 / 1000/0FAIL31.6s
CG-AL-M041medium1100.0 / 1003/3OK16.9s
CG-AL-M042medium1100.0 / 1008/8OK23.4s
CG-AL-M043medium1100.0 / 1005/5OK27.7s
CG-AL-M044medium1100.0 / 1006/6OK2m 28s
CG-AL-M045medium1100.0 / 1006/6OK22.9s
CG-AL-M088medium1100.0 / 1006/6OK19.8s
CG-AL-M112medium2100.0 / 1004/4OK23.9s