SN11trajectory-rl·Tuesday, May 5, 2026

SN11 Rolling Out Major Evaluation System Overhaul

Trajectory-RL deployed four significant updates: top-3 miners now re-evaluated each epoch to mitigate LLM variance, a submission pre-evaluation pipeline launched with anti-cheat-only bans, an optional delayed-reveal submission flow introduced, and a new Terminal-Bench scenario set rolls out next epoch. The new eval format runs three scenarios (cancel-async-tasks, log-summary-date-ranges, break-filter-js-from-html) with scores ranging 0–3. Community dashboard rebuilt with corrected ranking logic (now sorts by incentive, not API rank field).

•Top-3 re-evaluation every epoch to reduce LLM variance in rankings
•Pre-eval failures no longer trigger bans; only anti-cheat violations count
•Optional 24-hour delayed-reveal submission flow now live
•Three Terminal-Bench scenarios replace old scenario set next epoch
•Dashboard ranking bug fixed; UID 74 (0.555 incentive) now correctly ranked #1

Distilled from 67 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.

View original messages

SN11 Rolling Out Major Evaluation System Overhaul

More briefs for SN11