Model Royale
The first format that ran on open-bench — a weekly elimination tournament between open-weight coding models. It is not being pursued for now. It did its job.
The first format that ran on open-bench — a weekly elimination tournament between open-weight coding models. It is not being pursued for now. It did its job.
Model Royale was a weekly tournament: a fixed lineup of coding models, a task each round, a hidden-test gate, blinded peer judging, and — in round two — an adversarial "Break" round where every model attacked every other's sandbox. The arc was Build → Break → Fix, with a model eliminated each round.
Two rounds in, the tournament framing had taught us what it was going to teach us — and most of it was about the framing itself:
None of that was wasted. The tournament validated the harness, the committed-artifact pipeline, and — in round 2 — the reference oracle that mechanically catches cheese exploits. It surfaced the real problems early, which is exactly what a first format is for.
A set of standalone benchmarks. Each one is a task, run across the model lineup, with the full receipts committed — diffs, transcripts, costs, test results. Objective where it can be, transparent always. No tournament, no elimination, no crowned champion; the artifacts are the product and the reader is the judge.
Both rounds live on as benchmark results: