SN11 Rolling Out Major Evaluation System Overhaul
Share
Trajectory-RL deployed four significant updates: top-3 miners now re-evaluated each epoch to mitigate LLM variance, a submission pre-evaluation pipeline launched with anti-cheat-only bans, an optional delayed-reveal submission flow introduced, and a new Terminal-Bench scenario set rolls out next epoch. The new eval format runs three scenarios (cancel-async-tasks, log-summary-date-ranges, break-filter-js-from-html) with scores ranging 0–3. Community dashboard rebuilt with corrected ranking logic (now sorts by incentive, not API rank field).
- •Top-3 re-evaluation every epoch to reduce LLM variance in rankings
- •Pre-eval failures no longer trigger bans; only anti-cheat violations count
- •Optional 24-hour delayed-reveal submission flow now live
- •Three Terminal-Bench scenarios replace old scenario set next epoch
- •Dashboard ranking bug fixed; UID 74 (0.555 incentive) now correctly ranked #1
Distilled from 67 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.
View original messages
- Discord message 1500671040614039562
- Discord message 1500672398226362570
- Discord message 1500695961549148292
- Discord message 1500696596172505289
- Discord message 1500700337756045312
- Discord message 1500722427267907605
- Discord message 1500778632963031133
- Discord message 1500782947123462254
- Discord message 1500798308229058630
- Discord message 1500827015287275611
- Discord message 1500839833612980294
- Discord message 1500844768870273144
- Discord message 1500845165626134628
- Discord message 1500847288539414529