The Liminal was written across several months in late 2024 using GPT-4. The full novel — 17 chapters, five rotating point-of-view characters, a 24-hour countdown, and a horror escalation ladder from cosmic wrongness to full-body transformation — came out to roughly 110,000 words. The story worked. Beats landed, scenes connected, the countdown mechanics ratcheted, and the five-strand POV rotation delivered its dramatic payoffs on schedule.

The prose, though, had problems. And after a year away from the manuscript, those problems became impossible to ignore.

✦

What GPT-4 Does Well
(and Where It Falls Apart)

GPT-4 was remarkably good at structural storytelling. It could hold a five-character rotation across 17 chapters, maintain countdown arithmetic, escalate horror beats in a logical sequence, and plant narrative seeds that paid off 40,000 words later. Where it fell apart was at the sentence level. Returning to the manuscript after a year, I found systematic patterns I started calling “GPT-isms”:

The Explaining Narrator

The prose constantly told the reader what to feel. A character wouldn’t just experience something unsettling — the narration would add a clause explaining that the experience was, in fact, unsettling. If the scene did its job, the reader already knew. “A cold realisation settled behind her sternum”

Simile Overreach

GPT-4 loved similes, and it deployed them like a first-year MFA student — decoratively, reaching for literary effect in ways that broke the point-of-view filter. A systems engineer on a sealed lunar facility would not think something looked “like cigarette smoke.” Every simile needed to come from the character’s own vocabulary.

The Summary Clause

Sentences that ended by restating what the action had already demonstrated. If trust has been shown and tragedy is visible, that final clause is the narrator editorializing. Cut it. “The trust is the tragedy”

Uniform Register

All five POV characters sounded essentially the same. An astrophysicist, a systems engineer, an ethicist, a streamer, and a trauma surgeon all shared the same literary, slightly ornate narrative voice. For a novel built on the tension between five distinct perspectives, that uniformity was fatal.

These weren't isolated issues. They were systematic — baked into the model's tendencies and reproduced across every chapter with remarkable consistency.

✦

Building the Pipeline

I built a systematic, multi-phase revision process. Pass 1 moved through four phases: Constitution, Triage, Character Lock, and three Elevation layers. Pass 2 ran the revised prose through five adversarial critics, then integrated their notes through a triage system. Each pass was designed to produce a single consolidated document per chapter.

It was thorough. It was rigorous. And it almost immediately broke.

Phase 0 — Constitution~1,500 words
Phase 1 — Triage~2,000 words
Phase 2 — Character Lock~1,500 words
Phase 3, Layer 1~5,000–6,500 words
Phase 3, Layer 2~4,500–5,500 words
Phase 3, Layer 3~4,200–5,200 words
HTML + Navigation~2,000 words
Total per response20,000–24,000 words

Claude's practical output ceiling before degradation is roughly 12,000–16,000 tokens per response. I was asking for two to three times that.

✦

The Fix

The solution was architectural, not incremental. Instead of two monolithic passes, I redesigned the pipeline as four focused turns per chapter:

Turn 1

Constitution + Triage

Analysis only. No prose. Establish the chapter’s identity, diagnose every problem, lock character voices, flag load-bearing passages, inventory GPT-isms.

~4k–6k tokens

Turn 2

Elevation

Write the complete revised chapter in a single integrated pass, applying everything from Turn 1. Structural revision, dread elevation, voice enforcement, and polish happen simultaneously.

~6k–9k tokens

Turn 3

QA Critics

Five adversarial readers critique the Turn 2 prose. Analysis only, no rewriting. Each note gets triaged as REVISED, DEFENDED, or DEMOLISHED.

~4k–6k tokens

Turn 4

Final Chapter

Apply the REVISED notes, run a subtraction pass, run a trust pass, output the final prose with frontmatter.

~6k–9k tokens

Each turn stays at roughly half of Claude's practical ceiling, leaving room for the system prompt, project knowledge, and conversation history.

The result: no timeouts, consistent quality, and each turn finishing well within budget.

✦

What I Learned

Generation and revision are different problems.

They stress AI systems differently. GPT-4 was excellent at forward generation — building a novel from outlines and beats. Claude proved far better at the critical, diagnostic work of revision — reading existing prose, identifying specific weaknesses, and rewriting with surgical precision.

AI output limits are the real constraint, not input context.

I spent time worrying about whether too many project files were overwhelming the context window. The actual bottleneck was asking for too much output in a single response. Understanding this distinction changed everything about how I structured the workflow.

AI models adapt to impossible requests — and watching how they adapt is informative.

Claude didn’t just fail when I asked for 24,000 words of output — it compressed its approach, substituting revision notes for full rewrites, effectively redesigning the pipeline on the fly. Paying attention to how the model adapted told me exactly where the spec was unrealistic.

Systematic revision beats ad-hoc editing.

Even when the pipeline was breaking, the chapters that made it through the full process came out measurably better than anything I could have achieved through casual back-and-forth editing. The Constitution phase alone — forcing explicit identification of the chapter’s thesis, horror mechanics, and voice rules before touching any prose — prevented the kind of drift that makes long-form revision so difficult.

Voice differentiation is the hardest problem in multi-POV AI writing.

GPT-4’s uniform register was its deepest flaw, and fixing it required explicit voice signatures with forbidden registers, syntax patterns, and vocabulary restrictions for each character. An astrophysicist who “tastes” data and thinks in negative space. An engineer whose prose reads like a flight recorder. An ethicist who thinks in subordinate clauses. A streamer whose syntax fragments under pressure. A surgeon whose clinical precision fractures when empathy floods in.

“Trust the reader” is the single most useful revision directive for AI-generated prose.

Nearly every GPT-ism I catalogued — the explaining narrator, the summary clause, the editorial aside — came from the model not trusting the reader to feel what the scene provided. Making “trust the reader” the explicit operating principle for every revision decision cut more dead weight than any other single instruction.

What GPT-4 Does Well(and Where It Falls Apart)

Building the Pipeline

The Fix

What I Learned

What GPT-4 Does Well
(and Where It Falls Apart)