methodology changelog
What changed and when. Round-to-round comparability, made auditable. Lineup deltas auto-derived; spec/rubric edits pulled from each round's review markdown.
What changed and when. Round-to-round comparability, made auditable. Lineup deltas auto-derived; spec/rubric edits pulled from each round's review markdown.
Baseline. 7 models in the initial lineup: deepseek, deepseek-flash, glm, kimi, mimo, minimax, qwen.