SN120ⴷffine·Monday, April 27, 2026

Affine fixes NAVWORLD scoring instability with parallel judges

NAVWORLD scoring exhibited extreme variance—identical outputs received different scores (30% to 70% variance) due to single LLM judge bias. The team deployed a fix invoking all judges in parallel and computing median scores per dimension across 3 LLMs (or 2 when unavailable), eliminating single-judge bias. Terminal module completed shadow-run testing and fixes; scoring integration expected Tuesday or Wednesday.

•LLM judge inconsistency caused >40% score swings on same task
•New parallel judge median approach ensures reproducible results
•Terminal module ready for scoring rollout after bug fixes

Distilled from 20 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.

View original messages

Discord message 1497762715236831233
Discord message 1497762956132221038
Discord message 1497764269792563292
Discord message 1497764381188952235
Discord message 1497764755719454810
Discord message 1497772191612272770
Discord message 1497772626716655677
Discord message 1497773308060631110
Discord message 1497773729453707375
Discord message 1497774125798658149
Discord message 1497775261293482136
Discord message 1497775998748590262
Discord message 1497776887936581702
Discord message 1497778843451130057

Affine fixes NAVWORLD scoring instability with parallel judges

More briefs for SN120