SN14cacheon·Sunday, May 17, 2026

Cacheon Miners Compete in Back-to-Back Benchmark Wins

Two miners claimed top scoring positions in rapid succession during an evaluation run, with the current leader achieving 3% faster time-to-first-token over baseline vLLM through maxed-out CUDA graph batching. Both are tuning existing vLLM configurations; the team notes that meaningful score gains will require implementing prefix caching or speculative decoding. A testnet dashboard launched at cacheon.ai/dashboard/pulse.

•Current leader uses custom vLLM with 51 CUDA batch sizes vs standard 11
•3% TTFT improvement over baseline; no throughput change measured
•Triton kernel JIT-compilation causes first-request latency spikes
•Prefix caching and speculative decoding remain unimplemented optimization targets

Distilled from 5 team messages in the official Bittensor Discord. Generated by Claude Haiku 4.5.

View original messages

Discord message 1505338308287402126
Discord message 1505547871192092724
Discord message 1505622104639541358
Discord message 1505622143311020212
Discord message 1505651516110016542

Cacheon Miners Compete in Back-to-Back Benchmark Wins

More briefs for SN14