atomic-write · May 8, 2026

mimo takes the round with 29.0/30 — spec 10.0, quality 19.0. 7 models, $0.259 spent on outputs. Hidden tests: all passed.

scoreboard

total = peer-judged spec /15 + quality /15. hidden-tests gate the verdict.

impltotalspecqualbuildtestsverdict
01 mimo mimo-v2.5-pro 29.010.019.0pass12/12ship-with-cleanup
02 kimi kimi-k2.6 24.010.014.0pass12/12ship-with-cleanup
03 minimax minimax-m2.5 24.09.015.0pass12/12ship-with-cleanup
04 qwen qwen3.6-plus 24.09.015.0pass12/12ship-with-cleanup
05 deepseek deepseek-v4-pro 23.09.014.0pass12/12ship-with-cleanup
06 deepseek-flash deepseek-v4-flash 21.08.013.0pass12/12ship-with-cleanup
07 glm glm-5.1 21.09.012.0pass12/12ship-with-cleanup