dataset

2 rounds · 14 rows · MIT license. Rebuilt every deploy. Cite freely.

JSON
dataset.json
Full structured shape. Includes per-round arrays (scoreboard, samples, judges, self-bias) plus a flat flat[] mirror.
CSV
dataset.csv
One row per (round, impl). Drops into pandas / Excel / R without ceremony. Schema below.

schema (csv)

columntypedescription
round string (YYYY-MM-DD) Round date.
impl string Implementer slug (the model lineup name).
model_slug string|null Provider/model identifier from meta.json (e.g. opencode/kimi-k2.6).
spec_peer number|null Median spec score from peer judges, /15.
quality_peer number|null Median quality score from peer judges, /15.
composite number|null spec_peer + quality_peer; /30. Null when either component missing.
passed_hard_fail boolean True if the impl passed the hard-fail gate (hidden tests + build).
tests string Hidden test pass count, e.g. "9/9".
verdict string Mode-of-judges recommendation: ship | ship-with-cleanup | rewrite | reject.
samples integer Number of independent runs aggregated for this (round, impl).
total_cost_usd number Sum of inference cost across samples, USD.
total_tokens integer Sum of input+output+cache_read tokens across samples.
median_wall_seconds number|null Median wall-clock seconds across samples.
median_loc number|null Median lines of code in submitted implementation.

license

The dataset is released under the MIT license, matching the repository. Use it for anything; attribution appreciated, not required.

citation

@misc{openbench,
  author       = {fole},
  title        = {open-bench: weekly LLM coding battle royale},
  year         = {2026},
  howpublished = {\url{https://openbenchmark.dev/dataset}},
  note         = {schema_version=1}
}

schema versioning

The JSON file carries a meta.schema_version. Breaking changes bump it. Old shapes stay reachable in git history under frontend/src/lib/dataset.ts.

raw artifacts

This dataset is the aggregated table. Per-run inputs/outputs (transcripts, diffs, hidden test outputs, judge prompts) live in the repo at builds/ and results/.