Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

23.4% 15/64 of 64 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

15.0 / 100
Cost $0.66
Duration 2h 46m
Filter:
Per-task results for run 9bd420c0-4214-4466-a77c-7c611744ed45
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy2100.0 / 1007/7OK3m 13s
CG-AL-E002easy2100.0 / 1006/6OK2m 37s
CG-AL-E003easy20.0 / 1000/0FAIL10.9s
CG-AL-E004easy1100.0 / 1006/6OK3m 42s
CG-AL-E005easy20.0 / 1000/0FAIL10.7s
CG-AL-E006easy20.0 / 1000/0FAIL11.7s
CG-AL-E007easy20.0 / 1000/0FAIL48.4s
CG-AL-E008easy20.0 / 1000/0FAIL11.7s
CG-AL-E009easy20.0 / 1000/0FAIL10.7s
CG-AL-E010easy20.0 / 1000/0FAIL10.9s
CG-AL-E031easy2100.0 / 1003/3OK3m 46s
CG-AL-E032easy2100.0 / 1001/1OK2m 42s
CG-AL-E045easy1100.0 / 1004/4OK3m 44s
CG-AL-E050easy20.0 / 1000/0FAIL10.8s
CG-AL-E051easy20.0 / 1000/0FAIL10.8s
CG-AL-E052easy20.0 / 1000/0FAIL10.8s
CG-AL-E053easy20.0 / 1000/0FAIL20.7s
CG-AL-E054easy20.0 / 1000/0FAIL10.8s
CG-AL-E055easy20.0 / 1000/0FAIL10.7s
CG-AL-H001hard2100.0 / 10025/25OK3m 7s
CG-AL-H002hard20.0 / 1000/0FAIL11.6s
CG-AL-H003hard20.0 / 1000/0FAIL11.5s
CG-AL-H004hard2100.0 / 10014/14OK4m 5s
CG-AL-H005hard20.0 / 1000/0FAIL10.9s
CG-AL-H006hard20.0 / 1000/0FAIL10.8s
CG-AL-H007hard2100.0 / 10010/10OK3m 15s
CG-AL-H008hard20.0 / 1000/0FAIL47.1s
CG-AL-H009hard1100.0 / 10011/11OK3m 6s
CG-AL-H010hard262.5 / 1007/8OK2m 55s
CG-AL-H011hard20.0 / 1000/0FAIL1m 3s
CG-AL-H013hard1100.0 / 1009/9OK2m 51s
CG-AL-H014hard20.0 / 1000/0FAIL25.6s
CG-AL-H015hard1100.0 / 1004/4OK2m 43s
CG-AL-H016hard20.0 / 1000/0FAIL24.0s
CG-AL-H017hard20.0 / 1000/0FAIL18.4s
CG-AL-H018hard20.0 / 1000/0FAIL1m 19s
CG-AL-H019hard20.0 / 1000/0FAIL10.7s
CG-AL-H020hard20.0 / 1000/0FAIL3m 44s
CG-AL-H021hard20.0 / 1000/0FAIL2m 18s
CG-AL-H022hard20.0 / 1000/0FAIL2m 16s
CG-AL-H023hard20.0 / 1000/0FAIL4m 3s
CG-AL-H024hard20.0 / 1000/0FAIL1m 35s
CG-AL-H025hard20.0 / 1000/0FAIL58.0s
CG-AL-H026hard20.0 / 1000/0FAIL1m 22s
CG-AL-H205hard20.0 / 1000/0FAIL11.3s
CG-AL-M001medium20.0 / 1000/0FAIL1m 15s
CG-AL-M002medium20.0 / 1000/0FAIL10.8s
CG-AL-M003medium20.0 / 1000/0FAIL10.8s
CG-AL-M004medium20.0 / 1000/0FAIL1m 10s
CG-AL-M005medium20.0 / 1000/0FAIL3m 28s
CG-AL-M006medium20.0 / 1000/0FAIL1m 39s
CG-AL-M007medium20.0 / 1000/0FAIL3m 42s
CG-AL-M008medium20.0 / 1000/0FAIL4m 11s
CG-AL-M009medium20.0 / 1000/0FAIL2m 43s
CG-AL-M010medium20.0 / 1000/0FAIL11.2s
CG-AL-M020medium20.0 / 1000/0FAIL2m 3s
CG-AL-M021medium20.0 / 1000/0FAIL3m 21s
CG-AL-M022medium20.0 / 1000/0FAIL3m 45s
CG-AL-M023medium1100.0 / 10011/11OK3m 37s
CG-AL-M024medium20.0 / 1000/0FAIL1m 57s
CG-AL-M025medium262.5 / 1002/7OK2m 55s
CG-AL-M026medium1100.0 / 1008/8OK2m 41s
CG-AL-M088medium2100.0 / 1006/6OK3m 30s
CG-AL-M112medium262.5 / 1002/4OK3m 11s