Template: Coding Task Submission

Fill in each [BRACKET] section. Delete all comments before pasting.

Field 1: Prompt

[BUSINESS CONTEXT — who you are, what you need, why it matters. 2-3 sentences.]

[DATA — inline generator script, data tables, or sample. Must be runnable or copy-pasteable.]

[DELIVERABLE — what the model must produce. Be specific about outputs but not about methods.]

[CONSTRAINTS — scope limits, cost structure, domain rules. This is where you embed traps.]

Field 2: Gemini conversation link

[PASTE SHAREABLE LINK — must include the prompt and any follow-up hints you sent]

Field 3: Golden solution

"""
Golden Solution: [TITLE]
Requires: [dependencies]
Run: python golden_solution.py
"""

import json
import numpy as np
import pandas as pd
# [other imports]

# ── Data (same as prompt) ─────────────────────────────────────────
# [Paste or import the same data generation from Field 1]


# ── Step 1: [Name] ───────────────────────────────────────────────
# [Why this step matters. Which trap it addresses.]

def step_1():
    pass


# ── Step 2: [Name] ───────────────────────────────────────────────

def step_2():
    pass


# ── Step N: [Final evaluation / output] ───────────────────────────

def evaluate():
    pass


def main():
    # [Run pipeline, print summary, write results.json]
    results = {}
    with open("results.json", "w") as f:
        json.dump(results, f, indent=2)
    print("Wrote results.json")

if __name__ == "__main__":
    main()

[1-2 sentences: what this solution gets right that Gemini is expected to miss.]

Field 4: Rubric / Unit-tests

Score the response across five areas:

[Area 1 name] ([N] pts): [What to check. What passes. What fails.]

[Area 2 name] ([N] pts): [...]

[Area 3 name] ([N] pts): [...]

[Area 4 name] ([N] pts): [...]

[Area 5 name] ([N] pts): [...]

Critical fail conditions: [list 3-5 instant disqualifiers that cap the score regardless of other work]

Field 5: Report / Failure Analysis

[WHAT GEMINI GOT RIGHT — acknowledge genuine strengths]

[CORE FAILURE — the most important thing it got wrong, why it matters, and how the golden solution handles it correctly]

[SECONDARY ISSUES — weaker aspects that aren't deal-breakers]

[WHAT THIS REVEALS — the gap between surface competence and actual correctness]

[HINTING RESULTS — did follow-up prompts help? What does that say about the failure mode?]