SN66ninja

Pinned messages

1 pin across 1 channel

#ض・ninja・66· 1

wejh
2d ago
This is a new dashboard feature that runs our agent against a standard open-source SOTA agent, mini-swe-agent, on a fixed set of unscored tasks. These tasks do **not** affect miner scoring, and there is no reason to optimize specifically for them. The current scoring algorithm is staying the same. This benchmark is only meant to give the subnet a static reference point so we can track how overall agent quality is progressing over time. If you improve your agent in a real way, this score should naturally go up. We are also evaluating ways to reward more direct improvements to agents in the very near future, instead of some of the hill climbing we see today. more to come here soon.