Proctor Check Task Validator Prompt

Source: proctor-check-task-validator-prompt.md (ingested 2026-03-28)

Claude Code Prompt: Build Project Proctor Task Validator

You are operating in agentic mode. Build a complete, functional CLI tool called proctor-check — a submission validator for STEM task authoring. The tool validates structured task submissions against a strict ruleset before they go to human review.

This is NOT a study tool or training system. This is a real-time validation engine that catches rejection-causing errors.

CONTEXT

The user authors PhD-level STEM problems on a platform with a 16-step submission workflow. Tasks get rejected for specific, predictable rule violations. This tool must catch those violations before submission. The full ruleset is below.

ARCHITECTURE

Language: Node.js (TypeScript)
CLI: Pure Node.js argument parsing (no external CLI framework — zero npm dependencies beyond ts-node for direct execution)
Structure: Monorepo-ready so it can later be published as an npm package
Config: All validation rules defined in a single proctor-rules.json so the engine is decoupled from this specific project's rules

proctor-check/
├── package.json
├── tsconfig.json
├── bin/
│   └── proctor-check.js          # CLI entry point (JS, requires ts-node)
├── src/
│   ├── index.ts                  # Main orchestrator
│   ├── types.ts                  # All TypeScript interfaces
│   ├── validators/
│   │   ├── prompt.ts             # Step 1 validation
│   │   ├── final-answer.ts       # Step 5 validation
│   │   ├── pass-at-k.ts          # Step 6 validation
│   │   ├── model-failures.ts     # Steps 4, 7 validation
│   │   ├── failure-rationale.ts  # Step 8 validation
│   │   ├── answer-format.ts      # Step 9 validation
│   │   ├── references.ts         # Step 10 validation
│   │   ├── hints.ts              # Step 11 validation
│   │   ├── metadata.ts           # Step 12 validation
│   │   ├── solution.ts           # Step 13 validation
│   │   └── rubric.ts             # Step 14 validation (MOST IMPORTANT)
│   ├── rules/
│   │   └── proctor-rules.json    # All rules as structured data
│   ├── reporters/
│   │   ├── terminal.ts           # Pretty CLI output with ANSI colors
│   │   └── json.ts               # Machine-readable output
│   └── utils/
│       ├── latex.ts              # LaTeX parsing helpers
│       └── text-analysis.ts      # Verb detection, atomicity checks, etc.
├── templates/
│   ├── task-template.json        # Empty task structure for `proctor-check init`
│   └── rubric-template.json      # Rubric scaffold with approved verb list
└── test/
    ├── fixtures/                 # Sample good/bad submissions
    │   ├── good-task.json
    │   └── bad-task.json
    └── validators/
        └── test-all.js           # Unit tests for every validator

CLI COMMANDS

`proctor-check init`

Creates a task.json template in the current directory with all required fields and inline comments explaining each field. If task.json already exists, creates task-template.json instead.

`proctor-check validate [task.json]`

Runs ALL 11 validators against the task file. Outputs a section-by-section report with PASS/WARN/FAIL per rule. Ends with a GO / REVIEW / NO-GO verdict.

`proctor-check validate --section <name> [task.json]`

Runs only one validator. Valid section names: prompt, answer, answer-format, pass-at-k, failures, rationale, references, hints, metadata, solution, rubric

`proctor-check validate --json [task.json]`

Same as validate but outputs the full report as JSON.

`proctor-check score-rubric [task.json]`

Simulates scoring your step-by-step solution against your own rubric using keyword matching heuristics. Flags if the solution wouldn't earn 7/7.

`proctor-check lint-latex [file.md]`

Checks LaTeX formatting against Proctor rules: balanced $/$$ delimiters, units in \text{}, no bare * multiplication, balanced braces.

TYPES (src/types.ts)

export type Severity = 'FAIL' | 'WARN' | 'PASS' | 'INFO';

export interface ValidationResult {
  rule: string;
  severity: Severity;
  message: string;
  details?: string;
}

export interface ValidatorOutput {
  section: string;
  results: ValidationResult[];
}

export type Verdict = 'GO' | 'REVIEW' | 'NO-GO';

export interface ValidationReport {
  timestamp: string;
  taskFile: string;
  sections: ValidatorOutput[];
  summary: { fails: number; warns: number; passes: number; infos: number };
  verdict: Verdict;
}

export interface ModelResponse {
  correct: boolean | null;
  failure_type: string;
}

export interface RubricItem {
  criterion: string;
  weight: number;
  description: string;
  grading_guidance: string;
}

export interface Reference {
  key: string;
  target_source: string;
  required_to_solve: boolean | string;
}

export interface TaskMetadata {
  subdomain: string;
  education_level: string;
  difficulty: string;
}

export interface Task {
  domain: string;
  prompt: string;
  final_answer: string;
  final_answer_format: string;
  model_a: { response_1: ModelResponse; response_2: ModelResponse };
  model_b: { response_1: ModelResponse; response_2: ModelResponse };
  failure_rationale_a: string;
  failure_rationale_b: string;
  hints: string[];
  metadata: TaskMetadata;
  references: Reference[] | string;
  solution: string;
  rubric: RubricItem[];
}

TASK.JSON SCHEMA

The proctor-check init command should generate this structure (with _comment and _*_options helper fields):

{
  "domain": "",
  "prompt": "",
  "final_answer": "",
  "final_answer_format": "",
  "model_a": {
    "response_1": { "correct": null, "failure_type": "" },
    "response_2": { "correct": null, "failure_type": "" }
  },
  "model_b": {
    "response_1": { "correct": null, "failure_type": "" },
    "response_2": { "correct": null, "failure_type": "" }
  },
  "failure_rationale_a": "",
  "failure_rationale_b": "",
  "hints": ["", "", ""],
  "metadata": {
    "subdomain": "",
    "education_level": "",
    "difficulty": ""
  },
  "references": [],
  "solution": "",
  "rubric": [
    {
      "criterion": "",
      "weight": 0,
      "description": "",
      "grading_guidance": ""
    }
  ]
}

VALIDATION RULES — IMPLEMENT ALL OF THESE

Prompt Validator (Step 1)

FAIL if prompt contains multiple-choice patterns: A), B), C), D), (A), (B), (C), (D), "which of the following", "choose from the following", "select the correct"
FAIL if prompt is binary ("yes or no", "true or false", "yes/no", "true/false") or ternary ("increase/decrease/no change")
WARN if prompt does not specify answer format or precision. Look for keywords: "express your answer", "give your answer", "significant figures", "decimal places", "round to", "in units of"
WARN if prompt uses bare * as multiplication (conflicts with Markdown italics). Ignore * inside LaTeX $...$ blocks and ** (bold).
WARN if no LaTeX detected (inline $...$ or display $$...$$). All numbers/variables should use $ delimiters in STEM prompts.
WARN if units appear inside LaTeX $...$ but outside \text{}. Common units to check: kg, mol, Hz, Pa, atm, eV, nm, cm, mm, km, J, W, V, A, N, K, etc. Allow units inside \text{}, \mathrm{}, or \textrm{}.
FAIL if prompt requires external resources. Pattern-match: "refer to", "as described in", "according to the paper", "see figure", "as shown in the textbook", "look up", "use the paper", "the paper by", "the article by", "consult the", "reference the"
INFO if ranking task detected (keywords: rank, order, arrange, sort). Note: allowed if 4+ items.
WARN if estimated reasoning steps < 3. Estimate by counting action verbs (find, determine, calculate, compute, derive, show, prove, evaluate, estimate, approximate), conditionals, and LaTeX density.

Final Answer Validator (Step 5)

FAIL if answer contains explanatory text: "because", "therefore", "since", "this is due to", "as a result", "which means", "note that", "recall that"
FAIL if answer contains units (use regex for common units) UNLESS the prompt explicitly requires units in the answer (check for "include.*unit", "with.*unit", "answer.*in [unit]")
FAIL if answer starts with labels: "answer:", "final:", "result:", "solution:", "the answer is"
WARN if answer format doesn't match declared format type. Check: Integer = /^-?\d+$/, Decimal = /^-?\d+\.?\d*$/, Fraction = must contain /

Pass@K Validator (Step 6)

FAIL if all 4 model responses are marked correct: false (need at least 1 correct)
FAIL if Model A has 0 failures (both correct: true)
FAIL if Model B has 0 failures (both correct: true)
FAIL if total failure count is not 2, 3, or 4

Model Failure Validator (Steps 4, 7)

FAIL if fewer than 1 failure per model
WARN if failure_type matches invalid stump patterns: "rounding error", "arithmetic mistake", "formatting issue", "computational slip", "calculation error", "typo", "off by one". These don't count as valid stumps.
WARN if a response is marked correct: false but failure_type is empty
PASS with the failure type text for valid stumps

Failure Rationale Validator (Step 8)

Run separately for Model A and Model B rationales:

FAIL if rationale is under 100 characters
WARN if rationale doesn't reference a specific step/location. Look for patterns: "step \d", "in step", "at step", "response \d", "line \d", "equation \d", "where it", "when it", "the model's .* step"
WARN if rationale uses vague language: "the model was confused", "it made an error", "incorrect reasoning", "wrong approach", "bad logic", "messed up", "got it wrong"
WARN if rationale doesn't distinguish reasoning vs calculation error. Check if text mentions either reasoning-type words (reasoning, logic, conceptual, principle, theorem, assumption, deduction, inference) or calculation-type words (calculat, arithmetic, comput, numer, round, decimal).

References Validator (Step 10)

FAIL if references is a non-empty string that isn't valid JSON. Required JSON keys per reference: key, target_source, required_to_solve
WARN if references is empty/[] but prompt contains physical constant patterns: "6.674" (G), "6.022" (Avogadro), "1.38" (Boltzmann), "8.314" (gas constant), "3.0.*10^8" (c), "9.8" (g), "1.602" (e), "8.854" (ε₀), "6.626" (h)

Hints Validator (Step 11)

FAIL if not exactly 3 hints
FAIL if any hint references another hint: "hint 1", "hint 2", "hint 3", "previous hint", "first hint", "second hint", "third hint", "above hint", "prior hint", "as mentioned in hint", "building on the previous hint", "from hint"
WARN if any hint is under 20 characters
WARN if any hint contains the exact final answer value (string match, skip if answer is very short)
WARN if any hint doesn't end with proper punctuation (., ?, !, ))

Metadata Validator (Step 12)

FAIL if subdomain is empty
FAIL if education level not in: ["High-school olympiad", "Undergraduate", "Graduate"]
FAIL if difficulty not in: ["Easy", "Medium", "Hard"]
FAIL if domain not in: ["Chemistry", "Biology", "Engineering", "Physics", "Astronomy"]
WARN if education level is "High-school olympiad" (target should be Graduate/PhD)

Solution Validator (Step 13)

FAIL if solution doesn't contain numbered steps. Check for step\s+#?\d+ pattern OR start-of-line numbering /(?:^|\n)\s*\d+[.)]\s/. Must match at least one.
FAIL if solution's last ~500 characters don't contain a final answer label. Look for: "final answer", "therefore", "thus.*answer", "the answer is", "result:", "we get", "we obtain", "solution:"
WARN if solution contains handwaving: "it is obvious", "it's obvious", "clearly", "trivially", "it follows trivially", "obviously"
WARN if no LaTeX detected in solution
WARN if intermediate calculations use fewer than 5 decimal places. Scan for decimal numbers and check digit count after the dot.
WARN if log/exponential expressions are present AND intermediate values use fewer than 7 decimal places
FAIL if the final answer value from Step 5 doesn't appear near the end of the solution. For numeric answers, try approximate matching (within 0.1%).

Rubric Validator (Step 14) — THIS IS THE MOST IMPORTANT VALIDATOR

FAIL if fewer than 2 or more than 7 criteria
FAIL if weights don't sum to exactly 7
FAIL if any individual weight is outside 1–4 range
FAIL if any weight is not a positive integer
FAIL if last criterion doesn't reference the final answer (check combined text of criterion + description + grading_guidance for keywords: "final answer", "result", "concludes", "arrives at", "obtains", "correct answer", "final value", "final result")

FAIL if any criterion does NOT start with an approved action verb. The complete approved verb list:

Derives, States, Extracts, Identifies, Explains, Calculates, Cites, Compares,
Applies, Determines, Evaluates, Recognizes, Establishes, Solves, Proves,
Demonstrates, Formulates, Integrates, Analyzes, Constructs, Verifies,
Justifies, Converts, Simplifies, Approximates, Classifies, Distinguishes,
Predicts, Quantifies, Validates, Relates, Maps, Defines, Enumerates, Selects,
Combines, Decomposes, Transforms, Substitutes, Normalizes, Differentiates,
Interpolates, Extrapolates, Balances, Optimizes, Constrains, Bounds,
Parameterizes, Linearizes, Estimates, Arrives, Computes, Shows, Deduces,
Infers, Obtains

WARN if any criterion contains " and " or " & " joining two distinct checks (atomicity violation). Heuristic: check if both sides of "and" contain approved action verbs.
WARN if any criterion references another criterion by number: "criterion \d", "criteria \d", "item \d", "as in \d", "see \d", "above criterion", "previous criterion"
WARN if grading guidance is empty for any criterion
WARN if grading guidance doesn't include tolerance/precision language for criteria that look numerical (contain digits or words like "numer", "calcul", "value", "result")
WARN if grading guidance uses subjective terms without definitions: "reasonable", "appropriate", "sufficient", "adequate", "good", "proper", "suitable", "acceptable". Allow if term is followed by a definition (parenthetical, "meaning", "i.e.", "defined as").
INFO: note that proctor-check score-rubric can simulate scoring

UTILITY MODULES

src/utils/latex.ts

containsLatex(text) — returns true if inline $...$ or display $$...$$ found
findUnitsOutsideText(text) — extracts all LaTeX blocks, finds units not inside \text{}, returns array of issue strings
findBareAsterisks(text) — finds * positions outside LaTeX and **, returns positions array
lintLatex(text) — checks balanced $/$$ delimiters and balanced braces, returns issue strings
countDecimalPlaces(numStr) — counts digits after decimal point
findLowPrecisionNumbers(text, minDecimals) — scans for decimal numbers with fewer than minDecimals places

src/utils/text-analysis.ts

startsWithApprovedVerb(text) — returns matched verb or null
matchesPatterns(text, patterns) — tests each regex pattern case-insensitively, returns matched patterns
containsKeywords(text, keywords) — case-insensitive keyword search, returns found keywords
estimateReasoningSteps(prompt) — heuristic step counter using action words, conditionals, LaTeX density
isCompleteSentence(text) — checks for ending punctuation
hasAtomicityIssue(text) — detects " and " joining two verb phrases
findSubjectiveTerms(text, terms) — finds subjective terms not followed by definitions
hasCrossReference(text, patterns) — detects cross-references
containsLogExp(text) — detects log/ln/exp expressions

MAIN ORCHESTRATOR (src/index.ts)

loadTask(filePath) — reads and parses task.json, throws on missing file or invalid JSON
validateAll(task, taskFile) — runs all 11 validators in order, returns ValidationReport
validateSection(task, section) — runs single validator by name, throws on unknown section
buildReport(sections, taskFile) — aggregates results, computes summary counts and verdict
computeVerdict(fails, warns) — NO-GO if fails > 0, REVIEW if warns >= 3, else GO
scoreRubric(task) — keyword-matching heuristic that estimates how each rubric criterion maps to the solution
generateTemplate() — returns the task.json template object
SECTION_NAMES — exported array of all valid section names

REPORTER MODULES

Terminal Reporter (src/reporters/terminal.ts)

Uses raw ANSI escape codes (no chalk dependency)
Severity icons: 🔴 FAIL, 🟡 WARN, 🟢 PASS, 🔵 INFO
Groups results within each section: FAILs first, then WARNs, then INFOs, then PASSes
Shows details on the line below (dimmed)
Ends with boxed verdict banner and summary counts

JSON Reporter (src/reporters/json.ts)

Outputs the full ValidationReport as formatted JSON

OUTPUT FORMAT

Terminal output for every validate run ends with:

═══════════════════════════════════════════════════
  VERDICT: [GO | REVIEW | NO-GO]
═══════════════════════════════════════════════════
  FAILs: X  |  WARNs: Y  |  PASSes: Z  |  INFOs: W

NO-GO if any FAILs exist. REVIEW if 3+ WARNs. GO if all pass with < 3 warnings. Exit code: 0 for GO/REVIEW, 1 for NO-GO.

CLI ENTRY POINT (bin/proctor-check.js)

Starts with #!/usr/bin/env node
Registers ts-node at the top for direct .ts execution: require('ts-node').register({ transpileOnly: true, compilerOptions: { module: 'commonjs' } })
Parses args manually: positional args for command + file, --section, --json, --help, --version
Commands: init, validate, score-rubric, lint-latex
Handles all errors with colored error messages and non-zero exit codes

TEST FIXTURES

good-task.json

A realistic Physics task (rolling cylinder + inelastic collision) with:

Proper LaTeX in prompt with \text{} units
Valid model failures (1 per model, conceptual errors)
Detailed failure rationales (600+ chars each, referencing specific steps)
Exactly 3 standalone hints
Numbered step-by-step solution with LaTeX
5-criterion rubric summing to 7, all verb-led, last criterion = final answer
Valid metadata (Physics, Graduate, Medium)

bad-task.json

A deliberately broken submission that should trigger 25+ FAILs:

Invalid domain ("Art History")
Multiple-choice patterns AND binary question AND external resource reference AND bare * in prompt
Answer with labels, units, and explanatory text
All 4 model responses marked correct (no failures)
Rationales under 100 chars, using vague language
4 hints instead of 3, with cross-references and short entries
Empty subdomain, invalid education level and difficulty
Non-JSON references string
Solution with no numbered steps and handwaving
Rubric: non-verb-led criteria, weights sum to 8, weight > 4, missing guidance, subjective terms, criterion cross-references

UNIT TESTS (test/validators/test-all.js)

Write a self-contained test runner (no jest/mocha — just assertions with a counter) that:

Loads both fixtures
Runs each validator against both fixtures
Uses assertHasSeverity(results, severity, ruleSubstring, testName) to verify specific rules fire
Tests utility functions: containsLatex, findBareAsterisks, lintLatex, startsWithApprovedVerb, estimateReasoningSteps, isCompleteSentence, hasAtomicityIssue
Tests verdict logic: computeVerdict(0,0)=GO, computeVerdict(0,2)=GO, computeVerdict(0,3)=REVIEW, computeVerdict(1,0)=NO-GO
Tests full integration: bad task gets NO-GO with many fails, good task has fewer fails
Prints summary: ✅ Passed: N / ❌ Failed: M
Exits with code 1 if any test fails

QUALITY STANDARDS

All source files must have JSDoc/TSDoc comments on exports
Every validator is fully implemented — no placeholders or TODOs
Handle edge cases: empty fields, missing fields, malformed JSON, all with helpful error messages
All validation rules are driven from proctor-rules.json so the engine is reusable
The tool works with node bin/proctor-check.js after cloning (only requires Node.js 18+ and global ts-node)

EXECUTION ORDER

Create the project directory and package.json (no dependencies needed), tsconfig.json
Create src/types.ts
Build src/rules/proctor-rules.json with all rules as structured data
Implement src/utils/latex.ts and src/utils/text-analysis.ts
Implement each validator in order: rubric → prompt → solution → hints → rationale → final-answer → metadata → references → model-failures → pass-at-k → answer-format
Build reporters: terminal (ANSI colors) and json
Build src/index.ts orchestrator
Build bin/proctor-check.js CLI entry point
Create templates/rubric-template.json with approved verb list
Create test fixtures: test/fixtures/good-task.json and test/fixtures/bad-task.json
Write test/validators/test-all.js with 60+ assertions
Run tests, fix any failures until all pass
Run node bin/proctor-check.js validate test/fixtures/bad-task.json end-to-end to verify NO-GO
Run node bin/proctor-check.js validate test/fixtures/good-task.json end-to-end to verify GO or REVIEW

Start now. Build everything.