SN97distil·Saturday, May 16, 2026

Eval system recovers from deploy rollback; king cycling confirmed

Distil's evaluation pipeline recovered from a mid-round outage and state corruption following a deploy rollback. King UID 47 is actively re-evaluated each round (KL scores vary 1.45–1.56, confirming fresh evals). Challenger selection logic was temporarily reset, loading stale model commits from ~11 days prior; this has been identified and flagged for state rebuild. Team published comprehensive benchmark weightings, sample counts (300 prompts for KL axes, 6–18 items for v31 procedurals), and training guidance for closing gaps on weak axes.

•King re-eval confirmed working: UID 47 scores vary per round, not cached.
•Deploy rollback corrupted queue state; 256 total models listed instead of filtered round.
•Sample count bumps blocked pending 8×B200 timing lock-in for stability.
•17-item flagged backlog queued for next deploy (no deploys today).
•Team provided axis weights, training data mixes, temperature tuning guidance.

Distilled from 92 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.

View original messages

Eval system recovers from deploy rollback; king cycling confirmed

More briefs for SN97