Research · The Claims

A ledger that checks itself

Every load-bearing claim in the program, in one place: what it asserts, where it comes from, and a link to the interactive that lets you test it. The computed claims are re-derived from the models in this repository right now, in your browser. The same checks run on every commit, so if a model changes or a cited figure drifts, this page goes red and the build fails.

17 claims0 / 7 re-verified liveall guarded in CI
01

Classification as Infrastructure

2 pivots → 0rechecking…

The entity taxonomy is load-bearing: collapse SAGEN's six entity types into one and the engine can no longer represent a topic change at all.

02

SAGEN

goals 3 / 4 / 5 / 6rechecking…

The released engine's four-turn goal evolution is 3 / 4 / 5 / 6 — not the 3 / 5 / 6 / 7 printed in the paper's Table 5. The runnable artifact is authoritative.

byte-identicalrechecking…

Checkpoint after two turns, restore into a fresh engine, run two more: the injection is byte-for-byte identical to running all four continuously.

98.5% vs 20.5% vs 10.5%cited

SAGEN captures 98.5% of the evaluated information dimensions, versus 20.5% for a rolling summary and 10.5% for a raw buffer.

03

LLM-QP

crossover at K = |V|rechecking…

Sparse scoring (2dK) is cheaper than the dense head (2d|V|) for every valid-token count K below the vocabulary size; the two cross exactly at K = |V|.

~10x cheaperrechecking…

When the decode margin is large, amortized scoring skips the recompute and accrues cost roughly an order of magnitude slower than full recomputation.

near-oracle (≈1.04x)rechecking…

A LinUCB contextual-bandit router learns to route within a few percent of the hindsight oracle, beating both the static-dense and static-sparse policies.

05

The Invisible Architecture

6 games · 13 audiocited

A ~40,000-word interdisciplinary essay built as infrastructure you walk through: six interactive games and thirteen narrated audio sections, not a wall of text.

DSM-III, 1980cited

The DSM hardened into infrastructure at DSM-III (1980), when operationalized criteria displaced the psychodynamic paradigm in response to the reliability crisis.

06

The Beautiful Unfinished

inside vs outside viewcited

The planning-execution gap is structural, not a discipline failure: planning is an inside-view, System-1 narrative act, so the plan reliably beats the base rate in how good it feels.

38,000 words · 10 fieldscited

A 38,000-word synthesis across ten disciplines, from neuroscience to information theory, with every load-bearing claim carried to a linked bibliography.

07

The New Sorting Hat

61.3% vs 3.2%cited

AI detectors flagged 61.3% of TOEFL essays by non-native English writers as AI-generated, versus 3.2% of essays by US-born writers.

≈56% chancerechecking…

A ~4% per-sentence false-positive rate compounds: a 20-sentence paper has better-than-even odds of containing at least one falsely flagged sentence.

08

The Sorting Machine

71 vs 70 cutoffcited

A one-point difference at a diagnostic cutoff (a 71 against a threshold of 70) flips a child from eligible to ineligible, and the classification follows them through the record.

09

Accountability Tracker

25 × 11, 1.33M students✓ recomputed in CI

The audit covers 25 U.S. universities across an 11-dimension framework, roughly 1.33 million students in scope, and every dimension score is backed by a quoted policy excerpt with a source link.

12

The Warehouse

49 held optionscited

The federal state can be read as a portfolio of 49 unexercised options, most of them illegible to any evaluator that scores only observable output.

14

The Sorting Machine, Wartime Edition

~40% completecited

Ukraine's State Register of Property Rights is about 40% complete and mostly post-2013, so a pre-2013 paper-deed owner cannot prove ownership — the gate assisted filing cannot open.

Computed claims are recomputed from the shipped models (src/lib/research) and the audit dataset; sourced figures are cross-checked against the essays in CI (src/lib/research/__tests__/claims.test.js). This page is the human-readable face of that guard.

Need something like this built?

I design and ship AI tools, full-stack apps, and data pipelines — end to end, to production. Tell me the problem in a sentence; I'll give you an honest read on fit within a day.

Work with me →