Template — Coding Task Submission
Template: Coding Task Submission
Fill in each [BRACKET] section. Delete all comments before pasting.
Field 1: Prompt
[BUSINESS CONTEXT — who you are, what you need, why it matters. 2-3 sentences.]
[DATA — inline generator script, data tables, or sample. Must be runnable or copy-pasteable.]
[DELIVERABLE — what the model must produce. Be specific about outputs but not about methods.]
[CONSTRAINTS — scope limits, cost structure, domain rules. This is where you embed traps.]
Field 2: Gemini conversation link
[PASTE SHAREABLE LINK — must include the prompt and any follow-up hints you sent]
Field 3: Golden solution
"""
Golden Solution: [TITLE]
Requires: [dependencies]
Run: python golden_solution.py
"""
import json
import numpy as np
import pandas as pd
# [other imports]
# ── Data (same as prompt) ─────────────────────────────────────────
# [Paste or import the same data generation from Field 1]
# ── Step 1: [Name] ───────────────────────────────────────────────
# [Why this step matters. Which trap it addresses.]
def step_1():
pass
# ── Step 2: [Name] ───────────────────────────────────────────────
def step_2():
pass
# ── Step N: [Final evaluation / output] ───────────────────────────
def evaluate():
pass
def main():
# [Run pipeline, print summary, write results.json]
results = {}
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
print("Wrote results.json")
if __name__ == "__main__":
main()
[1-2 sentences: what this solution gets right that Gemini is expected to miss.]
Field 4: Rubric / Unit-tests
Score the response across five areas:
[Area 1 name] ([N] pts): [What to check. What passes. What fails.]
[Area 2 name] ([N] pts): [...]
[Area 3 name] ([N] pts): [...]
[Area 4 name] ([N] pts): [...]
[Area 5 name] ([N] pts): [...]
Critical fail conditions: [list 3-5 instant disqualifiers that cap the score regardless of other work]
Field 5: Report / Failure Analysis
[WHAT GEMINI GOT RIGHT — acknowledge genuine strengths]
[CORE FAILURE — the most important thing it got wrong, why it matters, and how the golden solution handles it correctly]
[SECONDARY ISSUES — weaker aspects that aren't deal-breakers]
[WHAT THIS REVEALS — the gap between surface competence and actual correctness]
[HINTING RESULTS — did follow-up prompts help? What does that say about the failure mode?]