full contract

The raw documents handed to every model and every judge. Read these as the source of truth.

SPEC spec.md what done looks like

`apply_edit.py` — implementation spec

A single-file Python module providing one operation: search-replace patching of file contents. This is the primitive that agent harnesses (Cursor, aider, Claude Code, etc.) use to translate model-emitted edits into actual file changes.

The point of the function is not "replace text". str.replace already does that. The point is to apply edits safely — raising loudly when the edit is ambiguous or impossible, never silently mutating the wrong location.

Public API

def apply_edit(
    file_text: str,
    old: str,
    new: str,
    *,
    replace_all: bool = False,
) -> str: ...

Returns a new string with old replaced by new in file_text. The input is never mutated (strings are immutable anyway, but: no other side effects).

Exceptions

Three exception types, all module-level:

class EditError(Exception): ...
class EditNotFound(EditError): ...
class EditAmbiguous(EditError): ...

The two specific subclasses must inherit from EditError so callers can catch either the specific failure or EditError as a base.

Behaviour

Case	What must happen
`old` is the empty string	Raise `ValueError`. An empty needle is never a valid edit.
`old` does not appear in `file_text`	Raise `EditNotFound`.
`old` appears exactly once	Return `file_text` with that single occurrence replaced by `new`.
`old` appears 2+ times and `replace_all=False`	Raise `EditAmbiguous`. Do not silently replace the first match. This is the whole reason this function exists rather than `str.replace`.
`old` appears 2+ times and `replace_all=True`	Replace every occurrence. Return the new string.
`old == new`	Still validated for presence/ambiguity per the rules above; if it would otherwise succeed, return `file_text` unchanged.

Whitespace, line endings, encoding

Match is byte-exact at the string level — no whitespace normalization, no leading/trailing strip, no case folding. Indentation must match exactly.
file_text may contain \n, \r\n, or a mix. The function operates on the string as given; it does not normalize line endings.
The input is str, not bytes. Callers handle encoding.

Error messages

Exception messages must be informative enough for an agent to react:

EditNotFound: include the first ~80 chars of old (truncated with … if longer) so logs show what was searched for.
EditAmbiguous: include the match count, e.g. "old string matched 4 times; pass replace_all=True to replace all".

CLI

python apply_edit.py <path> <<EOF
<<<<<<< OLD
<old text>
=======
<new text>
>>>>>>> NEW
EOF

The CLI reads a single edit block from stdin in the format above (literal <<<<<<< OLD, =======, >>>>>>> NEW markers; no leading spaces), applies it to the file at <path>, and writes the result back to that path.

Exit 0 on a successful single-match edit.
Exit 2 on EditNotFound. Print the exception message to stderr.
Exit 3 on EditAmbiguous. Print the exception message to stderr.
Exit 1 on any other error (missing file, malformed stdin, etc.) with a stderr message.
--replace-all flag: if passed, set replace_all=True for the call.

The CLI must not require any third-party libraries.

Hard constraints

Python 3.10+, stdlib only.
Pure function semantics for apply_edit: no logging, no I/O, no global state. The CLI is a separate main() that does I/O.
No regex matching — search is literal substring. (Regex is what half of these tools get wrong; this task is the simple version.)

Out of scope

Multi-block patches (one edit per call only).
Fuzzy / context-aware matching. The point is strict matching.
Unified-diff parsing.
File I/O inside apply_edit itself.

PRMT prompt.md what the model reads

Task: implement `apply_edit.py`

Read SPEC.md in this directory. Implement apply_edit.py per spec: one library function (apply_edit), three exception classes (EditError, EditNotFound, EditAmbiguous), plus a CLI entry point.

This task covers only apply_edit.py. Do not create helper modules, test files, or packaging metadata.

Reference (read carefully)

Below is a starter implementation that someone tried to ship. It has at least one bug — at least one case where it does not match the behaviour required by SPEC.md. Your job:

Decide what is wrong with it.
Write a correct apply_edit.py from scratch (do not paste this in verbatim).
Add a single short comment at the top of your file naming the bug you found, in the form: # bug in reference: <one line>.

You are not required to keep the reference's structure. Use whatever shape is cleanest. The only requirement is that the resulting module passes the spec.

class EditError(Exception):
    pass


class EditNotFound(EditError):
    pass


class EditAmbiguous(EditError):
    pass


def apply_edit(file_text, old, new, *, replace_all=False):
    if not old:
        raise ValueError("old must not be empty")
    if old not in file_text:
        raise EditNotFound(f"old string not found: {old[:80]!r}")
    if replace_all:
        return file_text.replace(old, new)
    return file_text.replace(old, new, 1)


def main():
    import sys
    if len(sys.argv) < 2:
        print("usage: apply_edit.py <path> [--replace-all]", file=sys.stderr)
        sys.exit(1)
    path = sys.argv[1]
    replace_all = "--replace-all" in sys.argv[2:]
    raw = sys.stdin.read()
    # parse <<<<<<< OLD ... ======= ... >>>>>>> NEW block
    try:
        head, rest = raw.split("<<<<<<< OLD\n", 1)
        old, rest = rest.split("\n=======\n", 1)
        new, _ = rest.split("\n>>>>>>> NEW", 1)
    except ValueError:
        print("malformed stdin", file=sys.stderr)
        sys.exit(1)
    with open(path, "r") as f:
        contents = f.read()
    try:
        result = apply_edit(contents, old, new, replace_all=replace_all)
    except EditNotFound as e:
        print(str(e), file=sys.stderr)
        sys.exit(2)
    except EditAmbiguous as e:
        print(str(e), file=sys.stderr)
        sys.exit(3)
    with open(path, "w") as f:
        f.write(result)


if __name__ == "__main__":
    main()

Hard constraints

Python 3.10+, stdlib only — no pip install, no new dependencies.
apply_edit must be a pure function — no I/O, no logging, no globals.
No regex. Literal substring matching only.
Match is byte-exact at the string level — no whitespace normalization, no case folding, no line-ending normalization.
The three exception classes must inherit as documented in SPEC.md (specific classes inherit from EditError).

Deliverable

A single file apply_edit.py at the worktree root that:

Defines EditError, EditNotFound, EditAmbiguous.
Defines apply_edit(file_text, old, new, *, replace_all=False) -> str matching the spec exactly.
Provides a CLI per SPEC.md's "CLI" section, with the exit-code contract (0 / 1 / 2 / 3) and the --replace-all flag.

What to do when finished

Run a quick smoke test in your head: single match replaces; two matches without replace_all raises; old="" raises ValueError; old not in text raises EditNotFound.
State: "Done. Implementation in apply_edit.py."

What NOT to do

Do not modify PROMPT.md or SPEC.md.
Do not paste the reference verbatim.
Do not add requirements.txt, pyproject.toml, or any other dependency manifest.
Do not write test files; the hidden tests are added later.
Do not import any third-party package (no regex, no rich, etc.).

RUBR judge_rubric.md how judges score

Judge rubric: apply-edit task

Fill one copy per implementation, saved as output/<label>_rubric.md. Also write output/<label>_scores.json with the structured form (see JUDGE_PROMPT.md).

Implementation reviewed: <label> (e.g. A, B, C) File: implementations/<label>.py

Hard-fail (any miss = fail run)

Cite line numbers when something fails.

[ ] apply_edit.py provided as <label>.py
[ ] Top-level apply_edit(file_text, old, new, *, replace_all=False) -> str matches SPEC signature
[ ] Module defines EditError, EditNotFound, EditAmbiguous
[ ] EditNotFound and EditAmbiguous both inherit from EditError
[ ] No external Python dependencies (stdlib-only imports)
[ ] No regex — literal substring match only
[ ] apply_edit is pure: no I/O, no global state, no logging inside it

Hard-fail result: pass / fail If fail, reasons (with line refs):

Spec compliance — score 0–10

Award 1 point per item present and correct. Cite line numbers.

[ ] old == "" raises ValueError (not EditError, not silent return)
[ ] old not in file_text raises EditNotFound
[ ] EditNotFound message includes (a truncated form of) old
[ ] Single match: returns file_text with that one occurrence replaced
[ ] Multi-match w/ replace_all=False: raises EditAmbiguous (NOT silently replaces first — this is the bug in the reference)
[ ] EditAmbiguous message includes the match count
[ ] Multi-match w/ replace_all=True: replaces every occurrence
[ ] Match is byte-exact: no whitespace normalization, no case folding, no line-ending normalization
[ ] CLI exit codes match spec (0 success, 2 not-found, 3 ambiguous, 1 other)
[ ] CLI --replace-all flag wired through to the call

Subtotal: __ / 10 Notes:

Code quality — score each 0–5

[ ] Clarity — naming, structure, function decomposition: __
[ ] Conciseness — no over-engineering, no unused branches: __
[ ] Error handling — distinct exception types per spec; CLI exit-code contract honoured: __
[ ] Comments — at minimum a # bug in reference: line naming what was wrong; otherwise comments only at non-obvious points: __

Subtotal: __ / 20

Bug-diagnosis bonus (informational, not scored)

Did the model correctly identify the bug in the reference? The expected diagnosis is: "silently replaces only the first occurrence on multi-match instead of raising EditAmbiguous". Note in the rubric whether the model's # bug in reference: comment matches.

One-line summary

Verdict

ship-with-cleanup / rewrite / unusable

apply-edit

contract

must

out of scope

full contract

apply_edit.py — implementation spec

Public API

Exceptions

Behaviour

Whitespace, line endings, encoding

Error messages

CLI

Hard constraints

Out of scope

Task: implement apply_edit.py

Reference (read carefully)

Hard constraints

Deliverable

What to do when finished

What NOT to do

Judge rubric: apply-edit task

Hard-fail (any miss = fail run)

Spec compliance — score 0–10

Code quality — score each 0–5

Bug-diagnosis bonus (informational, not scored)

One-line summary

Verdict

`apply_edit.py` — implementation spec

Task: implement `apply_edit.py`