Research · The Claims

A ledger that checks itself

Every load-bearing claim in the program, in one place: what it asserts, where it comes from, and a link to the interactive that lets you test it. The computed claims are re-derived from the models in this repository right now, in your browser. The same checks run on every commit, so if a model changes or a cited figure drifts, this page goes red and the build fails.

17 claims0 / 7 re-verified liveall guarded in CI

Classification as Infrastructure

2 pivots → 0rechecking…

The entity taxonomy is load-bearing: collapse SAGEN's six entity types into one and the engine can no longer represent a topic change at all.

Test it →Source: Computed live (SAGEN engine) + Bowker & Star, 1999 ↗

SAGEN

goals 3 / 4 / 5 / 6rechecking…

The released engine's four-turn goal evolution is 3 / 4 / 5 / 6 — not the 3 / 5 / 6 / 7 printed in the paper's Table 5. The runnable artifact is authoritative.

Test it →Source: Computed live from the ported reference engine

byte-identicalrechecking…

Checkpoint after two turns, restore into a fresh engine, run two more: the injection is byte-for-byte identical to running all four continuously.

Test it →Source: Computed live (serialize / restore round-trip)

98.5% vs 20.5% vs 10.5%cited

SAGEN captures 98.5% of the evaluated information dimensions, versus 20.5% for a rolling summary and 10.5% for a raw buffer.

Test it →Source: SAGEN paper, §5 (proof-of-concept evaluation)

LLM-QP

crossover at K = |V|rechecking…

Sparse scoring (2dK) is cheaper than the dense head (2d|V|) for every valid-token count K below the vocabulary size; the two cross exactly at K = |V|.

Test it →Source: Computed live + LLM-QP paper §2 (roofline lemma)

~10x cheaperrechecking…

When the decode margin is large, amortized scoring skips the recompute and accrues cost roughly an order of magnitude slower than full recomputation.

Test it →Source: Computed live + LLM-QP paper §3

near-oracle (≈1.04x)rechecking…

A LinUCB contextual-bandit router learns to route within a few percent of the hindsight oracle, beating both the static-dense and static-sparse policies.

Test it →Source: Computed live (LinUCB, Appendix A) + Li et al., 2010 ↗

The Invisible Architecture

6 games · 13 audiocited

A ~40,000-word interdisciplinary essay built as infrastructure you walk through: six interactive games and thirteen narrated audio sections, not a wall of text.

Test it →Source: The Invisible Architecture

DSM-III, 1980cited

The DSM hardened into infrastructure at DSM-III (1980), when operationalized criteria displaced the psychodynamic paradigm in response to the reliability crisis.

Test it →Source: The Invisible Architecture (history of the manual)

The Beautiful Unfinished

inside vs outside viewcited

The planning-execution gap is structural, not a discipline failure: planning is an inside-view, System-1 narrative act, so the plan reliably beats the base rate in how good it feels.

Test it →Source: Kahneman & Tversky; The Beautiful Unfinished

38,000 words · 10 fieldscited

A 38,000-word synthesis across ten disciplines, from neuroscience to information theory, with every load-bearing claim carried to a linked bibliography.

Test it →Source: The Beautiful Unfinished (bibliography)

The New Sorting Hat

61.3% vs 3.2%cited

AI detectors flagged 61.3% of TOEFL essays by non-native English writers as AI-generated, versus 3.2% of essays by US-born writers.

Test it →Source: Liang et al., 2023 (Patterns) ↗

≈56% chancerechecking…

A ~4% per-sentence false-positive rate compounds: a 20-sentence paper has better-than-even odds of containing at least one falsely flagged sentence.

Test it →Source: Computed: 1 − (1 − 0.04)²⁰

The Sorting Machine

71 vs 70 cutoffcited

A one-point difference at a diagnostic cutoff (a 71 against a threshold of 70) flips a child from eligible to ineligible, and the classification follows them through the record.

Test it →Source: Research calibration scenario (special-education cutoff)

Accountability Tracker

25 × 11, 1.33M students✓ recomputed in CI

The audit covers 25 U.S. universities across an 11-dimension framework, roughly 1.33 million students in scope, and every dimension score is backed by a quoted policy excerpt with a source link.

Test it →Source: Computed from the audit dataset (src/data/audits.json)

The Warehouse

49 held optionscited

The federal state can be read as a portfolio of 49 unexercised options, most of them illegible to any evaluator that scores only observable output.

Test it →Source: What the State Keeps (real-options analysis)

The Sorting Machine, Wartime Edition

~40% completecited

Ukraine's State Register of Property Rights is about 40% complete and mostly post-2013, so a pre-2013 paper-deed owner cannot prove ownership — the gate assisted filing cannot open.

Test it →Source: New America (DIGI), 2024 ↗

Computed claims are recomputed from the shipped models (src/lib/research) and the audit dataset; sourced figures are cross-checked against the essays in CI (src/lib/research/__tests__/claims.test.js). This page is the human-readable face of that guard.