dataset
2 rounds · 14 rows · MIT license. Rebuilt every deploy. Cite freely.
2 rounds · 14 rows · MIT license. Rebuilt every deploy. Cite freely.
| column | type | description |
|---|---|---|
round | string (YYYY-MM-DD) | Round date. |
impl | string | Implementer slug (the model lineup name). |
model_slug | string|null | Provider/model identifier from meta.json (e.g. opencode/kimi-k2.6). |
spec_peer | number|null | Median spec score from peer judges, /15. |
quality_peer | number|null | Median quality score from peer judges, /15. |
composite | number|null | spec_peer + quality_peer; /30. Null when either component missing. |
passed_hard_fail | boolean | True if the impl passed the hard-fail gate (hidden tests + build). |
tests | string | Hidden test pass count, e.g. "9/9". |
verdict | string | Mode-of-judges recommendation: ship | ship-with-cleanup | rewrite | reject. |
samples | integer | Number of independent runs aggregated for this (round, impl). |
total_cost_usd | number | Sum of inference cost across samples, USD. |
total_tokens | integer | Sum of input+output+cache_read tokens across samples. |
median_wall_seconds | number|null | Median wall-clock seconds across samples. |
median_loc | number|null | Median lines of code in submitted implementation. |
The dataset is released under the MIT license, matching the repository. Use it for anything; attribution appreciated, not required.
@misc{openbench,
author = {fole},
title = {open-bench: weekly LLM coding battle royale},
year = {2026},
howpublished = {\url{https://openbenchmark.dev/dataset}},
note = {schema_version=1}
} The JSON file carries a meta.schema_version. Breaking changes bump it. Old shapes stay reachable in git history under frontend/src/lib/dataset.ts.
This dataset is the aggregated table. Per-run inputs/outputs (transcripts, diffs, hidden test outputs, judge prompts) live in the repo at builds/ and results/.