A ledger that checks itself
Every load-bearing claim in the program, in one place: what it asserts, where it comes from, and a link to the interactive that lets you test it. The computed claims are re-derived from the models in this repository right now, in your browser. The same checks run on every commit, so if a model changes or a cited figure drifts, this page goes red and the build fails.
Classification as Infrastructure
The entity taxonomy is load-bearing: collapse SAGEN's six entity types into one and the engine can no longer represent a topic change at all.
SAGEN
The released engine's four-turn goal evolution is 3 / 4 / 5 / 6 — not the 3 / 5 / 6 / 7 printed in the paper's Table 5. The runnable artifact is authoritative.
Checkpoint after two turns, restore into a fresh engine, run two more: the injection is byte-for-byte identical to running all four continuously.
SAGEN captures 98.5% of the evaluated information dimensions, versus 20.5% for a rolling summary and 10.5% for a raw buffer.
LLM-QP
Sparse scoring (2dK) is cheaper than the dense head (2d|V|) for every valid-token count K below the vocabulary size; the two cross exactly at K = |V|.
When the decode margin is large, amortized scoring skips the recompute and accrues cost roughly an order of magnitude slower than full recomputation.
A LinUCB contextual-bandit router learns to route within a few percent of the hindsight oracle, beating both the static-dense and static-sparse policies.
The Invisible Architecture
A ~40,000-word interdisciplinary essay built as infrastructure you walk through: six interactive games and thirteen narrated audio sections, not a wall of text.
The DSM hardened into infrastructure at DSM-III (1980), when operationalized criteria displaced the psychodynamic paradigm in response to the reliability crisis.
The Beautiful Unfinished
The planning-execution gap is structural, not a discipline failure: planning is an inside-view, System-1 narrative act, so the plan reliably beats the base rate in how good it feels.
A 38,000-word synthesis across ten disciplines, from neuroscience to information theory, with every load-bearing claim carried to a linked bibliography.
The New Sorting Hat
AI detectors flagged 61.3% of TOEFL essays by non-native English writers as AI-generated, versus 3.2% of essays by US-born writers.
A ~4% per-sentence false-positive rate compounds: a 20-sentence paper has better-than-even odds of containing at least one falsely flagged sentence.
The Sorting Machine
A one-point difference at a diagnostic cutoff (a 71 against a threshold of 70) flips a child from eligible to ineligible, and the classification follows them through the record.
Accountability Tracker
The audit covers 25 U.S. universities across an 11-dimension framework, roughly 1.33 million students in scope, and every dimension score is backed by a quoted policy excerpt with a source link.
The Warehouse
The federal state can be read as a portfolio of 49 unexercised options, most of them illegible to any evaluator that scores only observable output.
The Sorting Machine, Wartime Edition
Ukraine's State Register of Property Rights is about 40% complete and mostly post-2013, so a pre-2013 paper-deed owner cannot prove ownership — the gate assisted filing cannot open.
Computed claims are recomputed from the shipped models (src/lib/research) and the audit dataset; sourced figures are cross-checked against the essays in CI (src/lib/research/__tests__/claims.test.js). This page is the human-readable face of that guard.