The Liminal was written across several months in late 2024 using GPT-4. The full novel — 17 chapters, five rotating point-of-view characters, a 24-hour countdown, and a horror escalation ladder from cosmic wrongness to full-body transformation — came out to roughly 110,000 words. The story worked. Beats landed, scenes connected, the countdown mechanics ratcheted, and the five-strand POV rotation delivered its dramatic payoffs on schedule.
The prose, though, had problems. And after a year away from the manuscript, those problems became impossible to ignore.
What GPT-4 Does Well
(and Where It Falls Apart)
GPT-4 was remarkably good at structural storytelling. It could hold a five-character rotation across 17 chapters, maintain countdown arithmetic, escalate horror beats in a logical sequence, and plant narrative seeds that paid off 40,000 words later. Where it fell apart was at the sentence level. Returning to the manuscript after a year, I found systematic patterns I started calling “GPT-isms”:
These weren't isolated issues. They were systematic — baked into the model's tendencies and reproduced across every chapter with remarkable consistency.
Building the Pipeline
I built a systematic, multi-phase revision process. Pass 1 moved through four phases: Constitution, Triage, Character Lock, and three Elevation layers. Pass 2 ran the revised prose through five adversarial critics, then integrated their notes through a triage system. Each pass was designed to produce a single consolidated document per chapter.
It was thorough. It was rigorous. And it almost immediately broke.
Claude's practical output ceiling before degradation is roughly 12,000–16,000 tokens per response. I was asking for two to three times that.
The Fix
The solution was architectural, not incremental. Instead of two monolithic passes, I redesigned the pipeline as four focused turns per chapter:
Analysis only. No prose. Establish the chapter’s identity, diagnose every problem, lock character voices, flag load-bearing passages, inventory GPT-isms.
Write the complete revised chapter in a single integrated pass, applying everything from Turn 1. Structural revision, dread elevation, voice enforcement, and polish happen simultaneously.
Five adversarial readers critique the Turn 2 prose. Analysis only, no rewriting. Each note gets triaged as REVISED, DEFENDED, or DEMOLISHED.
Apply the REVISED notes, run a subtraction pass, run a trust pass, output the final prose with frontmatter.
Each turn stays at roughly half of Claude's practical ceiling, leaving room for the system prompt, project knowledge, and conversation history.
The result: no timeouts, consistent quality, and each turn finishing well within budget.
What I Learned
They stress AI systems differently. GPT-4 was excellent at forward generation — building a novel from outlines and beats. Claude proved far better at the critical, diagnostic work of revision — reading existing prose, identifying specific weaknesses, and rewriting with surgical precision.
I spent time worrying about whether too many project files were overwhelming the context window. The actual bottleneck was asking for too much output in a single response. Understanding this distinction changed everything about how I structured the workflow.
Claude didn’t just fail when I asked for 24,000 words of output — it compressed its approach, substituting revision notes for full rewrites, effectively redesigning the pipeline on the fly. Paying attention to how the model adapted told me exactly where the spec was unrealistic.
Even when the pipeline was breaking, the chapters that made it through the full process came out measurably better than anything I could have achieved through casual back-and-forth editing. The Constitution phase alone — forcing explicit identification of the chapter’s thesis, horror mechanics, and voice rules before touching any prose — prevented the kind of drift that makes long-form revision so difficult.
GPT-4’s uniform register was its deepest flaw, and fixing it required explicit voice signatures with forbidden registers, syntax patterns, and vocabulary restrictions for each character. An astrophysicist who “tastes” data and thinks in negative space. An engineer whose prose reads like a flight recorder. An ethicist who thinks in subordinate clauses. A streamer whose syntax fragments under pressure. A surgeon whose clinical precision fractures when empathy floods in.
Nearly every GPT-ism I catalogued — the explaining narrator, the summary clause, the editorial aside — came from the model not trusting the reader to feel what the scene provided. Making “trust the reader” the explicit operating principle for every revision decision cut more dead weight than any other single instruction.