Scoring Bug Fixed, Forge Agent Doubles Performance
Share
A local testing bug was identified where missing `problem_id` fields caused all problems in a run to collapse into a single log, artificially tanking reasoning scores on multi-problem suites (0.7–0.85 on single problems vs. 0.13–0.31 on 30-problem runs). Mainnet is unaffected. Separately, the Forge agent improved from 0.29 to 0.57 via expanded search result pools and higher rescoring limits; product-category tasks now score 0.6 but shop and voucher lag at 0.3 and 0.2.
- •Local test scoring bug: missing problem_id collapses multiple problems into one request log
- •Mainnet unaffected; public API populates UUID problem_ids correctly
- •Forge agent 2x improvement: expanded candidate pool and raised rescoring limits
- •Category performance varies: products 0.6, shops 0.3, vouchers 0.2
Distilled from 5 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.