Cacheon Miners Compete in Back-to-Back Benchmark Wins
Share
Two miners claimed top scoring positions in rapid succession during an evaluation run, with the current leader achieving 3% faster time-to-first-token over baseline vLLM through maxed-out CUDA graph batching. Both are tuning existing vLLM configurations; the team notes that meaningful score gains will require implementing prefix caching or speculative decoding. A testnet dashboard launched at cacheon.ai/dashboard/pulse.
- •Current leader uses custom vLLM with 51 CUDA batch sizes vs standard 11
- •3% TTFT improvement over baseline; no throughput change measured
- •Triton kernel JIT-compilation causes first-request latency spikes
- •Prefix caching and speculative decoding remain unimplemented optimization targets
Distilled from 5 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.