Production Scoring Gap and Submission Validation Issues
Share
Miners reported discrepancies between local evaluation metrics (>0.9 F1) and production scores (~0.84), requesting clarity on whether production uses fixed or rotating datasets and what local-to-production correlation should be expected. Separately, a miner encountered repeated Ruff formatting validation failures on submission despite passing local checks and previous acceptance.
- •Local dry-run F1 scores consistently exceed production results by ~0.06-0.10
- •Unclear if production scoring uses fixed hidden dataset or rotating data subsets
- •Submission validator rejecting code that passed local Ruff and prior submission
- •No team response yet to either scoring gap or validation error questions
Distilled from 5 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.