Forge eval scores jump; qualifying threshold lowered
Share
Forge pushed a code update improving keyword sanitization and voucher extraction, boosting eval scores from 0.4286 to 0.7143 across tasks. Team lowered Race 38's qualifying threshold from 67.5% to 55% (capped at 55% going forward) to counteract overfitting and hardcoding incentives that had locked out repeat performers. Community raised concerns about hardcoded agents in top rankings; team confirmed no emissions for detected hardcoding.
- •Forge eval: keyword sanitizer and voucher extraction moved score 0.4286→0.7143
- •Qualifying threshold capped at 55% (retroactive for Race 38); 146→378 qualifiers
- •Hardcoded agents flagged by community; team assures zero emissions for detected cheating
Distilled from 16 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.
View original messages
- Discord message 1506733299916083212
- Discord message 1506737844457046256
- Discord message 1506739255672701041
- Discord message 1506739633982144734
- Discord message 1506740394665447565
- Discord message 1506740587657691328
- Discord message 1506740738690388119
- Discord message 1506743633863381193
- Discord message 1506744648205340813
- Discord message 1506750682789576837
- Discord message 1506841117914697748
- Discord message 1506896178967744623
- Discord message 1506936443308736562
- Discord message 1507029724541616361