SN66ninja·Sunday, May 31, 2026

Ninja Subnet Adjusts Judge Logic and Launches Benchmark

Ninja (SN66) identified critical issues with its Openrouter LLM judge logic and is raising the disqualification threshold to 7 to reduce false DQs from task ambiguity. The team shipped a new dashboard benchmark comparing agents against mini-swe-agent on unscored reference tasks—meant for tracking progress, not affecting miner scoring. Team is evaluating reward mechanisms to incentivize direct agent improvements over hill climbing, with agents given several days to adapt to the new judge before scoring resumes.

•DQ threshold increased to 7; judge logic overhaul underway for Openrouter
•New unscored benchmark dashboard launched; tracks agent quality against SOTA
•Reward mechanism redesign in progress; hill climbing optimization discouraged
•Agent submission queue issues reported; judge timeout standardization discussed

Distilled from 28 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.

View original messages

Ninja Subnet Adjusts Judge Logic and Launches Benchmark

More briefs for SN66