Ninja Subnet Adjusts Judge Logic and Launches Benchmark
Share
Ninja (SN66) identified critical issues with its Openrouter LLM judge logic and is raising the disqualification threshold to 7 to reduce false DQs from task ambiguity. The team shipped a new dashboard benchmark comparing agents against mini-swe-agent on unscored reference tasks—meant for tracking progress, not affecting miner scoring. Team is evaluating reward mechanisms to incentivize direct agent improvements over hill climbing, with agents given several days to adapt to the new judge before scoring resumes.
- •DQ threshold increased to 7; judge logic overhaul underway for Openrouter
- •New unscored benchmark dashboard launched; tracks agent quality against SOTA
- •Reward mechanism redesign in progress; hill climbing optimization discouraged
- •Agent submission queue issues reported; judge timeout standardization discussed
Distilled from 28 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.
View original messages
- Discord message 1510096204770443264
- Discord message 1510104740132814918
- Discord message 1510104803680718919
- Discord message 1510116372284309675
- Discord message 1510124135601143910
- Discord message 1510124237023613028
- Discord message 1510124260939661444
- Discord message 1510124290316697681
- Discord message 1510125347910320288
- Discord message 1510125380202270811
- Discord message 1510125415971295274
- Discord message 1510144185116790877
- Discord message 1510153947820589220
- Discord message 1510184966258954310