Model Royale

The first format that ran on open-bench — a weekly elimination tournament between open-weight coding models. It is not being pursued for now. It did its job.

what it was

Model Royale was a weekly tournament: a fixed lineup of coding models, a task each round, a hidden-test gate, blinded peer judging, and — in round two — an adversarial "Break" round where every model attacked every other's sandbox. The arc was Build → Break → Fix, with a model eliminated each round.

why it's paused

Two rounds in, the tournament framing had taught us what it was going to teach us — and most of it was about the framing itself:

None of that was wasted. The tournament validated the harness, the committed-artifact pipeline, and — in round 2 — the reference oracle that mechanically catches cheese exploits. It surfaced the real problems early, which is exactly what a first format is for.

what open-bench is now

A set of standalone benchmarks. Each one is a task, run across the model lineup, with the full receipts committed — diffs, transcripts, costs, test results. Objective where it can be, transparent always. No tournament, no elimination, no crowned champion; the artifacts are the product and the reader is the judge.

the rounds it produced

Both rounds live on as benchmark results: