Proctor Check Task Validator Prompt
Proctor Check Task Validator Prompt
Source: proctor-check-task-validator-prompt.md (ingested 2026-03-28)
Claude Code Prompt: Build Project Proctor Task Validator
You are operating in agentic mode. Build a complete, functional CLI tool called proctor-check — a submission validator for STEM task authoring. The tool validates structured task submissions against a strict ruleset before they go to human review.
This is NOT a study tool or training system. This is a real-time validation engine that catches rejection-causing errors.
CONTEXT
The user authors PhD-level STEM problems on a platform with a 16-step submission workflow. Tasks get rejected for specific, predictable rule violations. This tool must catch those violations before submission. The full ruleset is below.
ARCHITECTURE
- Language: Node.js (TypeScript)
- CLI: Pure Node.js argument parsing (no external CLI framework — zero npm dependencies beyond
ts-nodefor direct execution) - Structure: Monorepo-ready so it can later be published as an npm package
- Config: All validation rules defined in a single
proctor-rules.jsonso the engine is decoupled from this specific project's rules
proctor-check/
├── package.json
├── tsconfig.json
├── bin/
│ └── proctor-check.js # CLI entry point (JS, requires ts-node)
├── src/
│ ├── index.ts # Main orchestrator
│ ├── types.ts # All TypeScript interfaces
│ ├── validators/
│ │ ├── prompt.ts # Step 1 validation
│ │ ├── final-answer.ts # Step 5 validation
│ │ ├── pass-at-k.ts # Step 6 validation
│ │ ├── model-failures.ts # Steps 4, 7 validation
│ │ ├── failure-rationale.ts # Step 8 validation
│ │ ├── answer-format.ts # Step 9 validation
│ │ ├── references.ts # Step 10 validation
│ │ ├── hints.ts # Step 11 validation
│ │ ├── metadata.ts # Step 12 validation
│ │ ├── solution.ts # Step 13 validation
│ │ └── rubric.ts # Step 14 validation (MOST IMPORTANT)
│ ├── rules/
│ │ └── proctor-rules.json # All rules as structured data
│ ├── reporters/
│ │ ├── terminal.ts # Pretty CLI output with ANSI colors
│ │ └── json.ts # Machine-readable output
│ └── utils/
│ ├── latex.ts # LaTeX parsing helpers
│ └── text-analysis.ts # Verb detection, atomicity checks, etc.
├── templates/
│ ├── task-template.json # Empty task structure for `proctor-check init`
│ └── rubric-template.json # Rubric scaffold with approved verb list
└── test/
├── fixtures/ # Sample good/bad submissions
│ ├── good-task.json
│ └── bad-task.json
└── validators/
└── test-all.js # Unit tests for every validator
CLI COMMANDS
proctor-check init
Creates a task.json template in the current directory with all required fields and inline comments explaining each field. If task.json already exists, creates task-template.json instead.
proctor-check validate [task.json]
Runs ALL 11 validators against the task file. Outputs a section-by-section report with PASS/WARN/FAIL per rule. Ends with a GO / REVIEW / NO-GO verdict.
proctor-check validate --section <name> [task.json]
Runs only one validator. Valid section names: prompt, answer, answer-format, pass-at-k, failures, rationale, references, hints, metadata, solution, rubric
proctor-check validate --json [task.json]
Same as validate but outputs the full report as JSON.
proctor-check score-rubric [task.json]
Simulates scoring your step-by-step solution against your own rubric using keyword matching heuristics. Flags if the solution wouldn't earn 7/7.
proctor-check lint-latex [file.md]
Checks LaTeX formatting against Proctor rules: balanced $/$$ delimiters, units in \text{}, no bare * multiplication, balanced braces.
TYPES (src/types.ts)
export type Severity = 'FAIL' | 'WARN' | 'PASS' | 'INFO';
export interface ValidationResult {
rule: string;
severity: Severity;
message: string;
details?: string;
}
export interface ValidatorOutput {
section: string;
results: ValidationResult[];
}
export type Verdict = 'GO' | 'REVIEW' | 'NO-GO';
export interface ValidationReport {
timestamp: string;
taskFile: string;
sections: ValidatorOutput[];
summary: { fails: number; warns: number; passes: number; infos: number };
verdict: Verdict;
}
export interface ModelResponse {
correct: boolean | null;
failure_type: string;
}
export interface RubricItem {
criterion: string;
weight: number;
description: string;
grading_guidance: string;
}
export interface Reference {
key: string;
target_source: string;
required_to_solve: boolean | string;
}
export interface TaskMetadata {
subdomain: string;
education_level: string;
difficulty: string;
}
export interface Task {
domain: string;
prompt: string;
final_answer: string;
final_answer_format: string;
model_a: { response_1: ModelResponse; response_2: ModelResponse };
model_b: { response_1: ModelResponse; response_2: ModelResponse };
failure_rationale_a: string;
failure_rationale_b: string;
hints: string[];
metadata: TaskMetadata;
references: Reference[] | string;
solution: string;
rubric: RubricItem[];
}
TASK.JSON SCHEMA
The proctor-check init command should generate this structure (with _comment and _*_options helper fields):
{
"domain": "",
"prompt": "",
"final_answer": "",
"final_answer_format": "",
"model_a": {
"response_1": { "correct": null, "failure_type": "" },
"response_2": { "correct": null, "failure_type": "" }
},
"model_b": {
"response_1": { "correct": null, "failure_type": "" },
"response_2": { "correct": null, "failure_type": "" }
},
"failure_rationale_a": "",
"failure_rationale_b": "",
"hints": ["", "", ""],
"metadata": {
"subdomain": "",
"education_level": "",
"difficulty": ""
},
"references": [],
"solution": "",
"rubric": [
{
"criterion": "",
"weight": 0,
"description": "",
"grading_guidance": ""
}
]
}
VALIDATION RULES — IMPLEMENT ALL OF THESE
Prompt Validator (Step 1)
- FAIL if prompt contains multiple-choice patterns:
A),B),C),D),(A),(B),(C),(D), "which of the following", "choose from the following", "select the correct" - FAIL if prompt is binary ("yes or no", "true or false", "yes/no", "true/false") or ternary ("increase/decrease/no change")
- WARN if prompt does not specify answer format or precision. Look for keywords: "express your answer", "give your answer", "significant figures", "decimal places", "round to", "in units of"
- WARN if prompt uses bare
*as multiplication (conflicts with Markdown italics). Ignore*inside LaTeX$...$blocks and**(bold). - WARN if no LaTeX detected (inline
$...$or display$$...$$). All numbers/variables should use$delimiters in STEM prompts. - WARN if units appear inside LaTeX
$...$but outside\text{}. Common units to check: kg, mol, Hz, Pa, atm, eV, nm, cm, mm, km, J, W, V, A, N, K, etc. Allow units inside\text{},\mathrm{}, or\textrm{}. - FAIL if prompt requires external resources. Pattern-match: "refer to", "as described in", "according to the paper", "see figure", "as shown in the textbook", "look up", "use the paper", "the paper by", "the article by", "consult the", "reference the"
- INFO if ranking task detected (keywords: rank, order, arrange, sort). Note: allowed if 4+ items.
- WARN if estimated reasoning steps < 3. Estimate by counting action verbs (find, determine, calculate, compute, derive, show, prove, evaluate, estimate, approximate), conditionals, and LaTeX density.
Final Answer Validator (Step 5)
- FAIL if answer contains explanatory text: "because", "therefore", "since", "this is due to", "as a result", "which means", "note that", "recall that"
- FAIL if answer contains units (use regex for common units) UNLESS the prompt explicitly requires units in the answer (check for "include.*unit", "with.*unit", "answer.*in [unit]")
- FAIL if answer starts with labels: "answer:", "final:", "result:", "solution:", "the answer is"
- WARN if answer format doesn't match declared format type. Check: Integer =
/^-?\d+$/, Decimal =/^-?\d+\.?\d*$/, Fraction = must contain/
Pass@K Validator (Step 6)
- FAIL if all 4 model responses are marked
correct: false(need at least 1 correct) - FAIL if Model A has 0 failures (both
correct: true) - FAIL if Model B has 0 failures (both
correct: true) - FAIL if total failure count is not 2, 3, or 4
Model Failure Validator (Steps 4, 7)
- FAIL if fewer than 1 failure per model
- WARN if
failure_typematches invalid stump patterns: "rounding error", "arithmetic mistake", "formatting issue", "computational slip", "calculation error", "typo", "off by one". These don't count as valid stumps. - WARN if a response is marked
correct: falsebutfailure_typeis empty - PASS with the failure type text for valid stumps
Failure Rationale Validator (Step 8)
Run separately for Model A and Model B rationales:
- FAIL if rationale is under 100 characters
- WARN if rationale doesn't reference a specific step/location. Look for patterns: "step \d", "in step", "at step", "response \d", "line \d", "equation \d", "where it", "when it", "the model's .* step"
- WARN if rationale uses vague language: "the model was confused", "it made an error", "incorrect reasoning", "wrong approach", "bad logic", "messed up", "got it wrong"
- WARN if rationale doesn't distinguish reasoning vs calculation error. Check if text mentions either reasoning-type words (reasoning, logic, conceptual, principle, theorem, assumption, deduction, inference) or calculation-type words (calculat, arithmetic, comput, numer, round, decimal).
References Validator (Step 10)
- FAIL if references is a non-empty string that isn't valid JSON. Required JSON keys per reference:
key,target_source,required_to_solve - WARN if references is empty/[] but prompt contains physical constant patterns: "6.674" (G), "6.022" (Avogadro), "1.38" (Boltzmann), "8.314" (gas constant), "3.0.*10^8" (c), "9.8" (g), "1.602" (e), "8.854" (ε₀), "6.626" (h)
Hints Validator (Step 11)
- FAIL if not exactly 3 hints
- FAIL if any hint references another hint: "hint 1", "hint 2", "hint 3", "previous hint", "first hint", "second hint", "third hint", "above hint", "prior hint", "as mentioned in hint", "building on the previous hint", "from hint"
- WARN if any hint is under 20 characters
- WARN if any hint contains the exact final answer value (string match, skip if answer is very short)
- WARN if any hint doesn't end with proper punctuation (., ?, !, ))
Metadata Validator (Step 12)
- FAIL if subdomain is empty
- FAIL if education level not in:
["High-school olympiad", "Undergraduate", "Graduate"] - FAIL if difficulty not in:
["Easy", "Medium", "Hard"] - FAIL if domain not in:
["Chemistry", "Biology", "Engineering", "Physics", "Astronomy"] - WARN if education level is "High-school olympiad" (target should be Graduate/PhD)
Solution Validator (Step 13)
- FAIL if solution doesn't contain numbered steps. Check for
step\s+#?\d+pattern OR start-of-line numbering/(?:^|\n)\s*\d+[.)]\s/. Must match at least one. - FAIL if solution's last ~500 characters don't contain a final answer label. Look for: "final answer", "therefore", "thus.*answer", "the answer is", "result:", "we get", "we obtain", "solution:"
- WARN if solution contains handwaving: "it is obvious", "it's obvious", "clearly", "trivially", "it follows trivially", "obviously"
- WARN if no LaTeX detected in solution
- WARN if intermediate calculations use fewer than 5 decimal places. Scan for decimal numbers and check digit count after the dot.
- WARN if log/exponential expressions are present AND intermediate values use fewer than 7 decimal places
- FAIL if the final answer value from Step 5 doesn't appear near the end of the solution. For numeric answers, try approximate matching (within 0.1%).
Rubric Validator (Step 14) — THIS IS THE MOST IMPORTANT VALIDATOR
- FAIL if fewer than 2 or more than 7 criteria
- FAIL if weights don't sum to exactly 7
- FAIL if any individual weight is outside 1–4 range
- FAIL if any weight is not a positive integer
- FAIL if last criterion doesn't reference the final answer (check combined text of criterion + description + grading_guidance for keywords: "final answer", "result", "concludes", "arrives at", "obtains", "correct answer", "final value", "final result")
- FAIL if any criterion does NOT start with an approved action verb. The complete approved verb list:
Derives, States, Extracts, Identifies, Explains, Calculates, Cites, Compares, Applies, Determines, Evaluates, Recognizes, Establishes, Solves, Proves, Demonstrates, Formulates, Integrates, Analyzes, Constructs, Verifies, Justifies, Converts, Simplifies, Approximates, Classifies, Distinguishes, Predicts, Quantifies, Validates, Relates, Maps, Defines, Enumerates, Selects, Combines, Decomposes, Transforms, Substitutes, Normalizes, Differentiates, Interpolates, Extrapolates, Balances, Optimizes, Constrains, Bounds, Parameterizes, Linearizes, Estimates, Arrives, Computes, Shows, Deduces, Infers, Obtains - WARN if any criterion contains " and " or " & " joining two distinct checks (atomicity violation). Heuristic: check if both sides of "and" contain approved action verbs.
- WARN if any criterion references another criterion by number: "criterion \d", "criteria \d", "item \d", "as in \d", "see \d", "above criterion", "previous criterion"
- WARN if grading guidance is empty for any criterion
- WARN if grading guidance doesn't include tolerance/precision language for criteria that look numerical (contain digits or words like "numer", "calcul", "value", "result")
- WARN if grading guidance uses subjective terms without definitions: "reasonable", "appropriate", "sufficient", "adequate", "good", "proper", "suitable", "acceptable". Allow if term is followed by a definition (parenthetical, "meaning", "i.e.", "defined as").
- INFO: note that
proctor-check score-rubriccan simulate scoring
UTILITY MODULES
src/utils/latex.ts
containsLatex(text)— returns true if inline$...$or display$$...$$foundfindUnitsOutsideText(text)— extracts all LaTeX blocks, finds units not inside\text{}, returns array of issue stringsfindBareAsterisks(text)— finds*positions outside LaTeX and**, returns positions arraylintLatex(text)— checks balanced$/$$delimiters and balanced braces, returns issue stringscountDecimalPlaces(numStr)— counts digits after decimal pointfindLowPrecisionNumbers(text, minDecimals)— scans for decimal numbers with fewer than minDecimals places
src/utils/text-analysis.ts
startsWithApprovedVerb(text)— returns matched verb or nullmatchesPatterns(text, patterns)— tests each regex pattern case-insensitively, returns matched patternscontainsKeywords(text, keywords)— case-insensitive keyword search, returns found keywordsestimateReasoningSteps(prompt)— heuristic step counter using action words, conditionals, LaTeX densityisCompleteSentence(text)— checks for ending punctuationhasAtomicityIssue(text)— detects " and " joining two verb phrasesfindSubjectiveTerms(text, terms)— finds subjective terms not followed by definitionshasCrossReference(text, patterns)— detects cross-referencescontainsLogExp(text)— detects log/ln/exp expressions
MAIN ORCHESTRATOR (src/index.ts)
loadTask(filePath)— reads and parses task.json, throws on missing file or invalid JSONvalidateAll(task, taskFile)— runs all 11 validators in order, returns ValidationReportvalidateSection(task, section)— runs single validator by name, throws on unknown sectionbuildReport(sections, taskFile)— aggregates results, computes summary counts and verdictcomputeVerdict(fails, warns)— NO-GO if fails > 0, REVIEW if warns >= 3, else GOscoreRubric(task)— keyword-matching heuristic that estimates how each rubric criterion maps to the solutiongenerateTemplate()— returns the task.json template objectSECTION_NAMES— exported array of all valid section names
REPORTER MODULES
Terminal Reporter (src/reporters/terminal.ts)
- Uses raw ANSI escape codes (no chalk dependency)
- Severity icons: 🔴 FAIL, 🟡 WARN, 🟢 PASS, 🔵 INFO
- Groups results within each section: FAILs first, then WARNs, then INFOs, then PASSes
- Shows details on the line below (dimmed)
- Ends with boxed verdict banner and summary counts
JSON Reporter (src/reporters/json.ts)
- Outputs the full ValidationReport as formatted JSON
OUTPUT FORMAT
Terminal output for every validate run ends with:
═══════════════════════════════════════════════════
VERDICT: [GO | REVIEW | NO-GO]
═══════════════════════════════════════════════════
FAILs: X | WARNs: Y | PASSes: Z | INFOs: W
NO-GO if any FAILs exist. REVIEW if 3+ WARNs. GO if all pass with < 3 warnings. Exit code: 0 for GO/REVIEW, 1 for NO-GO.
CLI ENTRY POINT (bin/proctor-check.js)
- Starts with
#!/usr/bin/env node - Registers ts-node at the top for direct .ts execution:
require('ts-node').register({ transpileOnly: true, compilerOptions: { module: 'commonjs' } }) - Parses args manually: positional args for command + file,
--section,--json,--help,--version - Commands: init, validate, score-rubric, lint-latex
- Handles all errors with colored error messages and non-zero exit codes
TEST FIXTURES
good-task.json
A realistic Physics task (rolling cylinder + inelastic collision) with:
- Proper LaTeX in prompt with
\text{}units - Valid model failures (1 per model, conceptual errors)
- Detailed failure rationales (600+ chars each, referencing specific steps)
- Exactly 3 standalone hints
- Numbered step-by-step solution with LaTeX
- 5-criterion rubric summing to 7, all verb-led, last criterion = final answer
- Valid metadata (Physics, Graduate, Medium)
bad-task.json
A deliberately broken submission that should trigger 25+ FAILs:
- Invalid domain ("Art History")
- Multiple-choice patterns AND binary question AND external resource reference AND bare
*in prompt - Answer with labels, units, and explanatory text
- All 4 model responses marked correct (no failures)
- Rationales under 100 chars, using vague language
- 4 hints instead of 3, with cross-references and short entries
- Empty subdomain, invalid education level and difficulty
- Non-JSON references string
- Solution with no numbered steps and handwaving
- Rubric: non-verb-led criteria, weights sum to 8, weight > 4, missing guidance, subjective terms, criterion cross-references
UNIT TESTS (test/validators/test-all.js)
Write a self-contained test runner (no jest/mocha — just assertions with a counter) that:
- Loads both fixtures
- Runs each validator against both fixtures
- Uses
assertHasSeverity(results, severity, ruleSubstring, testName)to verify specific rules fire - Tests utility functions:
containsLatex,findBareAsterisks,lintLatex,startsWithApprovedVerb,estimateReasoningSteps,isCompleteSentence,hasAtomicityIssue - Tests verdict logic:
computeVerdict(0,0)=GO,computeVerdict(0,2)=GO,computeVerdict(0,3)=REVIEW,computeVerdict(1,0)=NO-GO - Tests full integration: bad task gets NO-GO with many fails, good task has fewer fails
- Prints summary:
✅ Passed: N/❌ Failed: M - Exits with code 1 if any test fails
QUALITY STANDARDS
- All source files must have JSDoc/TSDoc comments on exports
- Every validator is fully implemented — no placeholders or TODOs
- Handle edge cases: empty fields, missing fields, malformed JSON, all with helpful error messages
- All validation rules are driven from
proctor-rules.jsonso the engine is reusable - The tool works with
node bin/proctor-check.jsafter cloning (only requires Node.js 18+ and global ts-node)
EXECUTION ORDER
- Create the project directory and
package.json(no dependencies needed),tsconfig.json - Create
src/types.ts - Build
src/rules/proctor-rules.jsonwith all rules as structured data - Implement
src/utils/latex.tsandsrc/utils/text-analysis.ts - Implement each validator in order: rubric → prompt → solution → hints → rationale → final-answer → metadata → references → model-failures → pass-at-k → answer-format
- Build reporters: terminal (ANSI colors) and json
- Build
src/index.tsorchestrator - Build
bin/proctor-check.jsCLI entry point - Create
templates/rubric-template.jsonwith approved verb list - Create test fixtures:
test/fixtures/good-task.jsonandtest/fixtures/bad-task.json - Write
test/validators/test-all.jswith 60+ assertions - Run tests, fix any failures until all pass
- Run
node bin/proctor-check.js validate test/fixtures/bad-task.jsonend-to-end to verify NO-GO - Run
node bin/proctor-check.js validate test/fixtures/good-task.jsonend-to-end to verify GO or REVIEW
Start now. Build everything.