Morphism Benchmark Methodology
Morphism Benchmark Methodology
Source: benchmark-methodology.md (ingested 2026-03-28)
title: Benchmark Methodology type: reference authority: non-normative audience: [contributors, researchers] last-verified: 2026-03-17
Benchmark Methodology
NON-NORMATIVE.
Comparative Benchmarks (benchmarks/comparative.py)
What is measured: Wall-clock time and invariant count to validate N governance policies.
Approaches
| Approach | What it does | |----------|-------------| | Manual pipeline | Sequential script-based validation: syntax check, compliance check, reference check per policy. Simple dict validation. | | Categorical | Functor verification (composition + identity laws), morphism mapping, Cech cohomology drift detection, proof strategies. |
What the results show
The categorical approach checks significantly more invariants per run (~4-9x more depending on policy count) because functor laws, sheaf cohomology, and proofs verify compositional properties that sequential scripts miss entirely.
The manual pipeline is faster in wall-clock time for simple validation because it does less work per policy. The categorical approach trades time for coverage depth.
What was removed (and why)
| Removed | Reason |
|---------|--------|
| time.sleep() calls in manual pipeline | Artificial delays inflated speedup ratios dishonestly |
| coverage_percent field (80% manual, 100% categorical) | Not empirically measured; was hardcoded |
| memory_estimate_kb field | Not measured with tracemalloc; was fabricated |
| coverage_improvement and memory_reduction summary stats | Derived from fabricated inputs |
Caveats
- At small N, categorical overhead (functor setup, sheaf construction) dominates
- The manual pipeline does equivalent logical work (same policies, same check types) but at a lower level of mathematical abstraction
- Speedup ratios will vary by hardware; the invariant ratio is the more stable metric
- These benchmarks run on synthetic policies, not production governance artifacts