Run success rate

Run success rate

Tasks the run solved on its last attempt / tasks attempted in this run.

Formula: COUNT(distinct tasks where last attempt passed) / COUNT(distinct tasks attempted in this run)

Per-run metric for the model's "final answer" on each task. Differs from leaderboard pass_at_n: this denominator is the run's own attempted-task count, not the task set size, so partial runs are not penalised for unattempted tasks.

32.8% 21/64 of 64 tasks attempted in this run
Avg score

Avg attempt score

Mean per-attempt score on a 0–100 point scale (partial credit). Drill-down only.

Formula: Mean of attempt scores across all results rows: SUM(score) / COUNT(*) over the results table. Each attempt earns 0–100 points based on compile + test outcomes.

Drill-down companion to pass_at_n. Rewards partial credit but not directly comparable to pass rate; use for within-model analysis.

23.9 / 100
Cost $0.39
Duration 4h 45m
Filter:
Per-task results for run 0a7fca96-9d44-4436-bf17-6edf802ef66b
TaskDifficultyAttemptScoreTestsCompileDuration
CG-AL-E001easy1100.0 / 1007/7OK3m 0s
CG-AL-E002easy20.0 / 1000/0FAIL1m 23s
CG-AL-E003easy1100.0 / 1005/5OK2m 37s
CG-AL-E004easy1100.0 / 1006/6OK2m 41s
CG-AL-E005easy20.0 / 1000/0FAIL37.2s
CG-AL-E006easy20.0 / 1000/0FAIL1m 5s
CG-AL-E007easy1100.0 / 1007/7OK3m 6s
CG-AL-E008easy20.0 / 1000/0FAIL1m 20s
CG-AL-E009easy20.0 / 1000/0FAIL46.9s
CG-AL-E010easy2100.0 / 1005/5OK5m 41s
CG-AL-E031easy1100.0 / 1003/3OK3m 44s
CG-AL-E032easy1100.0 / 1001/1OK3m 14s
CG-AL-E045easy1100.0 / 1004/4OK2m 41s
CG-AL-E050easy20.0 / 1000/0FAIL31.7s
CG-AL-E051easy20.0 / 1000/0FAIL1m 53s
CG-AL-E052easy262.5 / 10014/16OK4m 11s
CG-AL-E053easy20.0 / 1000/0FAIL1m 35s
CG-AL-E054easy20.0 / 1000/0FAIL1m 14s
CG-AL-E055easy1100.0 / 1008/8OK2m 46s
CG-AL-H001easy1100.0 / 10025/25OK3m 20s
CG-AL-H002easy2100.0 / 1004/4OK3m 52s
CG-AL-H003easy262.5 / 1003/5OK4m 3s
CG-AL-H004easy2100.0 / 10014/14OK2m 40s
CG-AL-H005easy20.0 / 1000/0FAIL2m 6s
CG-AL-H006easy1100.0 / 1006/6OK3m 19s
CG-AL-H007easy1100.0 / 10010/10OK4m 7s
CG-AL-H008easy20.0 / 1000/0FAIL42.2s
CG-AL-H009easy1100.0 / 10011/11OK3m 33s
CG-AL-H010easy1100.0 / 1008/8OK5m 17s
CG-AL-H011easy20.0 / 1000/0FAIL5m 37s
CG-AL-H013easy1100.0 / 1009/9OK3m 2s
CG-AL-H014easy20.0 / 1000/0FAIL44.3s
CG-AL-H015easy2100.0 / 1004/4OK2m 41s
CG-AL-H016easy20.0 / 1000/0FAIL2m 6s
CG-AL-H017easy20.0 / 1000/0FAIL43.9s
CG-AL-H018easy20.0 / 1000/0FAIL2m 13s
CG-AL-H019easy20.0 / 1000/0FAIL58.8s
CG-AL-H020easy20.0 / 1000/0FAIL53.0s
CG-AL-H021easy20.0 / 1000/0FAIL3m 15s
CG-AL-H022easy20.0 / 1000/0FAIL5m 3s
CG-AL-H023easy20.0 / 1000/0FAIL8m 25s
CG-AL-H024easy20.0 / 1000/0FAIL3m 10s
CG-AL-H025easy2100.0 / 1007/7OK2m 58s
CG-AL-H026easy20.0 / 1000/0FAIL46.1s
CG-AL-H205easy20.0 / 1000/0FAIL1m 2s
CG-AL-M001easy20.0 / 1000/0FAIL5m 11s
CG-AL-M002easy20.0 / 1000/0FAIL2m 5s
CG-AL-M003easy262.5 / 1008/9OK3m 55s
CG-AL-M004easy20.0 / 1000/0FAIL1m 1s
CG-AL-M005easy20.0 / 1000/0FAIL4m 2s
CG-AL-M006easy20.0 / 1000/0FAIL1m 17s
CG-AL-M007easy20.0 / 1000/0FAIL6m 48s
CG-AL-M008easy20.0 / 1000/0FAIL3m 56s
CG-AL-M009easy2100.0 / 10011/11OK7m 41s
CG-AL-M010easy20.0 / 1000/0FAIL1m 49s
CG-AL-M020easy20.0 / 1000/0FAIL32.8s
CG-AL-M021easy20.0 / 1000/0FAIL1m 22s
CG-AL-M022easy20.0 / 1000/0FAIL1m 9s
CG-AL-M023easy20.0 / 1000/0FAIL33.0s
CG-AL-M024easy20.0 / 1000/0FAIL1m 28s
CG-AL-M025easy20.0 / 1000/0FAIL7m 56s
CG-AL-M026easy262.5 / 1006/8OK5m 19s
CG-AL-M088easy2100.0 / 1006/6OK2m 57s
CG-AL-M112easy20.0 / 1000/0FAIL6m 22s