Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

61.8% 68/110 of 110 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

45.6 / 100
Cost $0.72
Duration 1h 16m
Filter:
Per-task results for run 2fff6752-91c5-4d88-9560-c34647d29135
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy1100.0 / 1007/7OK38.8s
CG-AL-E002easy1100.0 / 1006/6OK3m 17s
CG-AL-E003easy1100.0 / 1005/5OK40.3s
CG-AL-E004easy1100.0 / 1006/6OK34.0s
CG-AL-E005easy2100.0 / 10013/13OK31.3s
CG-AL-E006easy1100.0 / 1007/7OK3m 7s
CG-AL-E007easy2100.0 / 1007/7OK27.8s
CG-AL-E008easy1100.0 / 1006/6OK42.8s
CG-AL-E009easy1100.0 / 1005/5OK46.8s
CG-AL-E010easy1100.0 / 1005/5OK49.4s
CG-AL-E031easy1100.0 / 1003/3OK51.6s
CG-AL-E032easy1100.0 / 1001/1OK49.9s
CG-AL-E045easy1100.0 / 1004/4OK15.6s
CG-AL-E050easy20.0 / 1000/0FAIL7.4s
CG-AL-E051easy2100.0 / 10015/15OK20.2s
CG-AL-E052easy2100.0 / 10016/16OK14.9s
CG-AL-E053easy2100.0 / 1003/3OK2m 41s
CG-AL-E054easy20.0 / 1000/0FAIL6.5s
CG-AL-E055easy1100.0 / 1008/8OK15.7s
CG-AL-E056easy262.5 / 10010/15OK16.0s
CG-AL-E057easy20.0 / 1000/0FAIL9.9s
CG-AL-E058easy2100.0 / 1001/1OK14.9s
CG-AL-H001hard1100.0 / 10025/25OK16.6s
CG-AL-H002hard2100.0 / 1004/4OK16.5s
CG-AL-H003hard1100.0 / 1005/5OK17.8s
CG-AL-H004hard1100.0 / 10014/14OK16.1s
CG-AL-H005hard20.0 / 1000/0FAIL7.6s
CG-AL-H006hard1100.0 / 1006/6OK14.7s
CG-AL-H007hard2100.0 / 10010/10OK17.0s
CG-AL-H008hard20.0 / 1000/0FAIL8.8s
CG-AL-H009hard1100.0 / 10011/11OK18.1s
CG-AL-H010hard1100.0 / 1008/8OK15.9s
CG-AL-H011hard20.0 / 1000/0FAIL6.3s
CG-AL-H013hard1100.0 / 1009/9OK22.1s
CG-AL-H014hard20.0 / 1000/0FAIL7.2s
CG-AL-H015hard1100.0 / 1004/4OK16.7s
CG-AL-H016hard20.0 / 1000/0FAIL7.1s
CG-AL-H017hard20.0 / 1000/0FAIL6.8s
CG-AL-H018hard2100.0 / 1006/6OK15.2s
CG-AL-H019hard2100.0 / 1005/5OK14.8s
CG-AL-H020hard20.0 / 1000/0FAIL9.3s
CG-AL-H021hard20.0 / 1000/0FAIL9.3s
CG-AL-H022hard262.5 / 10017/21OK18.6s
CG-AL-H023hard20.0 / 1000/0FAIL41.7s
CG-AL-H024hard1100.0 / 1009/9OK29.1s
CG-AL-H025hard1100.0 / 1007/7OK47.3s
CG-AL-H026hard1100.0 / 1008/8OK15.9s
CG-AL-H027hard2100.0 / 1004/4OK16.7s
CG-AL-H028hard1100.0 / 1009/9OK26.7s
CG-AL-H029hard262.5 / 10016/18OK20.8s
CG-AL-H030hard20.0 / 1000/0FAIL8.9s
CG-AL-H031hard2100.0 / 10022/22OK19.4s
CG-AL-H032hard2100.0 / 10019/19OK20.8s
CG-AL-H033hard1100.0 / 1005/5OK2m 28s
CG-AL-H034hard2100.0 / 1003/3OK14.1s
CG-AL-H035hard1100.0 / 1003/3OK15.8s
CG-AL-H036hard20.0 / 1000/0FAIL7.5s
CG-AL-H037hard20.0 / 1000/0FAIL6.9s
CG-AL-H038hard20.0 / 1000/0FAIL7.0s
CG-AL-H039hard1100.0 / 1004/4OK15.0s
CG-AL-H040hard1100.0 / 1002/2OK14.5s
CG-AL-H041hard20.0 / 1000/0FAIL7.4s
CG-AL-H042hard20.0 / 1000/0FAIL10.4s
CG-AL-H043hard1100.0 / 1005/5OK13.6s
CG-AL-H050hard262.5 / 1002/3OK13.5s
CG-AL-H051hard1100.0 / 1004/4OK15.0s
CG-AL-H052hard1100.0 / 1005/5OK16.0s
CG-AL-H053hard2100.0 / 1004/4OK14.0s
CG-AL-H054hard2100.0 / 1005/5OK16.8s
CG-AL-H056hard20.0 / 1000/0FAIL6.4s
CG-AL-H057hard1100.0 / 1004/4OK2m 33s
CG-AL-H058hard1100.0 / 1005/5OK14.7s
CG-AL-H205hard2100.0 / 1006/6OK15.1s
CG-AL-M001medium20.0 / 1000/0FAIL7.5s
CG-AL-M002medium262.5 / 10021/22OK17.8s
CG-AL-M003medium20.0 / 1000/0FAIL9.5s
CG-AL-M004medium1100.0 / 10012/12OK2m 29s
CG-AL-M005medium20.0 / 1000/0FAIL13.0s
CG-AL-M006medium20.0 / 1000/0FAIL10.1s
CG-AL-M007medium20.0 / 1000/0FAIL16.1s
CG-AL-M008medium262.5 / 1008/12OK1m 4s
CG-AL-M009medium1100.0 / 10011/11OK30.9s
CG-AL-M010medium2100.0 / 10021/21OK3m 4s
CG-AL-M020medium2100.0 / 10017/17OK34.0s
CG-AL-M021medium20.0 / 1000/0FAIL26.9s
CG-AL-M022medium1100.0 / 1009/9OK16.8s
CG-AL-M023medium262.5 / 10010/11OK15.6s
CG-AL-M024medium2100.0 / 10010/10OK24.7s
CG-AL-M025medium262.5 / 1002/7OK14.1s
CG-AL-M026medium1100.0 / 1008/8OK14.9s
CG-AL-M027medium20.0 / 1000/0FAIL19.8s
CG-AL-M028medium2100.0 / 1003/3OK2m 35s
CG-AL-M029medium2100.0 / 1003/3OK2m 30s
CG-AL-M031medium20.0 / 1000/0FAIL8.2s
CG-AL-M032medium20.0 / 1000/0FAIL6.3s
CG-AL-M033medium2100.0 / 1002/2OK14.6s
CG-AL-M034medium262.5 / 1001/2OK14.2s
CG-AL-M035medium1100.0 / 1002/2OK14.1s
CG-AL-M036medium20.0 / 1000/0FAIL6.4s
CG-AL-M037medium20.0 / 1000/0FAIL6.6s
CG-AL-M038medium1100.0 / 1002/2OK13.4s
CG-AL-M039medium1100.0 / 1002/2OK2m 17s
CG-AL-M040medium262.5 / 1001/2OK13.7s
CG-AL-M041medium20.0 / 1000/0FAIL5.6s
CG-AL-M042medium2100.0 / 1008/8OK16.6s
CG-AL-M043medium262.5 / 1000/5OK13.5s
CG-AL-M044medium1100.0 / 1006/6OK2m 34s
CG-AL-M045medium2100.0 / 1006/6OK13.4s
CG-AL-M088medium1100.0 / 1006/6OK14.5s
CG-AL-M112medium2100.0 / 1004/4OK15.5s