The Invisible Architecture

Psychiatric Classification as Infrastructure

~3 hours reading ยท 6 interactive experiences

Section 1 โ€” Introduction: The Invisible Architecture

You probably turned on a faucet this morning. Water came out. You did not, at that moment, think about the more than two million miles of water pipe running beneath American streets, or the treatment plants upstream, or the fact that the entire system was designed in the nineteenth century for a twentieth-century population and is now, in several famous cases, poisoning people. You just brushed your teeth.

That's how infrastructure works. When it's functioning, it disappears. You look straight through it at whatever it delivers โ€” clean water, electrical power, a navigable road โ€” and the system itself becomes invisible. It earns your attention only when it breaks: the pipe bursts, the grid goes down, the bridge collapses. The rest of the time, it's just there, beneath everything, holding the world up without anyone noticing.

This essay is about a different kind of infrastructure โ€” one that most people don't recognize as infrastructure at all.


When a psychiatrist diagnoses someone with major depressive disorder, several things happen at once. A clinical judgment gets made. A billing code gets generated. An insurance claim becomes possible. A treatment pathway opens up. A legal status may shift. A story begins to form โ€” in the patient's mind, in their family, in their medical record โ€” about what kind of problem this is and what kind of person has it. All of these consequences flow from a single act of classification: matching a person's suffering to a category in a book.

The book is the Diagnostic and Statistical Manual of Mental Disorders, published by the American Psychiatric Association, now in its fifth edition. If you've heard of it at all, you probably think of it as a scientific reference โ€” medicine's field guide to the mind. And it is that, sort of. But it's also something much larger and stranger. The DSM is the operating system beneath American mental health care. Its categories determine what insurance will cover and what it won't. Its language shapes how researchers design studies, how pharmaceutical companies develop drugs, how courts evaluate competency, how schools provide accommodations, how people understand their own inner lives. It has been translated into dozens of languages and exported, alongside its sibling classification in the World Health Organization's ICD system, to virtually every country on earth.

This makes the DSM infrastructure in the precise, technical sense that the sociologist Susan Leigh Star and the information scientist Geoffrey Bowker meant when they studied classification systems. Infrastructure, in their analysis, isn't just physical stuff โ€” pipes and wires and roads. It's any system that meets a specific set of conditions: it's embedded inside other structures. It's transparent to use, meaning you look through it rather than at it. It's learned as part of membership in a community. It has reach and scope beyond any single site or practice. It's built on an installed base that constrains what comes next. And it becomes visible mainly upon breakdown โ€” when something goes wrong, when a case doesn't fit, when the categories fail someone.

Psychiatric classification meets every one of these conditions. The DSM is embedded in insurance protocols, legal standards, research designs, pharmaceutical regulation, and clinical training programs. Clinicians learn to think through it as part of their professional formation; patients learn to narrate their suffering in its terms as part of receiving care. Its reach is global. Each new edition inherits the architecture of the previous one, because too many institutional structures have grown up around the existing categories to start from scratch. And it becomes acutely, painfully visible only when it breaks โ€” when someone's suffering doesn't fit any available category, when two clinicians disagree about a diagnosis, when a billing code is denied, when the science underneath a diagnostic concept turns out to be weaker than anyone assumed.

The plumbing metaphor isn't decorative. It's structural. And it reframes the question this essay is asking.


The conventional way to critique the DSM is to ask: Is it scientifically valid? Do the diagnostic categories correspond to real entities in nature โ€” distinct diseases with distinct causes, courses, and cures? This is a legitimate question, and the honest answer, as we'll see, is complicated. Some categories appear to have strong empirical support. Others look increasingly like artifacts of committee decisions. Most fall somewhere in between, capturing something real about human suffering without capturing it very precisely.

But validity is only one question you can ask about a classification system, and it may not be the most revealing one. An infrastructure lens suggests a different inquiry, one that is more encompassing and, I think, more useful:

What does psychiatric classification do?

Not what does it describe. Not what does it get right or wrong. What does it do โ€” as infrastructure, as a system embedded in other systems? What does it make possible? What does it make impossible? What does it make visible, and what does it render invisible? Who benefits from the way it's currently built, and who is harmed? And if we wanted to build something better โ€” what would "better" even mean, given everything the current system is connected to?

This is the essay's organizing question, and it takes us somewhere different from the usual debates. It means we can't evaluate psychiatric classification by looking only at the science, because the system doesn't run on science alone. It runs on institutional momentum, on financial incentives, on cultural narratives, on political negotiations, on the accumulated weight of everything that's been built on top of it. Understanding the classification means understanding all of those layers, which means pulling from philosophy and history and sociology and neuroscience and psychometrics and cross-cultural psychology and information science โ€” not because interdisciplinarity is fashionable, but because the object of study is genuinely multi-layered and no single discipline can see all of it.

The information scientists have a name for what this essay is doing. They call it infrastructural inversion โ€” the deliberate move of foregrounding a system that's normally in the background. Ordinarily, we look through the DSM at disorders. We ask "Is this patient depressed?" and treat the classification as a transparent window onto the patient's reality. Infrastructural inversion means turning the gaze back onto the window itself. Not to deny that people suffer โ€” they manifestly do โ€” but to notice that the window shapes what we see, and that the shape of the window was determined by decisions that could have gone differently.


Here's what's coming.

The essay is built in four movements, with six interactive games embedded at the joints.

Part I establishes the system as most people encounter it. We'll trace the history of how Western psychiatry has sorted suffering, from the asylums to the DSM-5, as a story not of linear scientific progress but of institutional politics, philosophical wagers, and infrastructure versioning โ€” each edition inheriting the constraints of the last. Then we'll drop below the manual into the room where diagnosis actually happens, hearing from the people on both sides of the encounter: the clinician pressing dimensional human suffering into categorical boxes, and the patient whose life reorganizes around a label. At that point, you'll play your first game and discover, with your own hands, how arbitrary the boundary lines can be.

Part II asks why the system looks the way it does. We'll examine the philosophical question of what a diagnosis even is โ€” discovery or invention, map or road โ€” and then meet the strangest property of psychiatric infrastructure: it changes the thing it classifies. We'll follow the story of biological psychiatry's attempt to rebuild the system from the brain up, and confront the sobering gap between what was promised and what's been delivered. And we'll look at the deep tension between making a diagnostic system consistent and making it accurate, a tradeoff you'll experience firsthand in another game.

Part III pulls the camera out further. Classification doesn't exist in a laboratory. It exists inside societies, economies, and power structures. We'll examine how diagnosis functions as a social instrument โ€” a gatekeeper for treatment, insurance, disability benefits, and legal standing โ€” and who controls that gate. Then we'll leave the Western frame entirely and encounter radically different ways of organizing human suffering, before asking whether the global export of Western psychiatric categories is an act of care or an act of epistemological colonialism.

Part IV brings it home. We'll look at how classification shapes treatment (and whether it should), how it interacts with developing minds, and what happens when children grow up inside diagnostic categories that may not fit them. Then, in the synthesis, we'll assemble the full infrastructure map and ask the question the essay has been building toward: what would it actually take to build something better? Not a utopian fantasy, but a serious design brief โ€” informed by everything we've learned about why the current system persists, what functions any replacement would need to serve, and what the switching costs of migration would look like.

Six times along the way, the essay will pause, and you'll play something. These aren't illustrations or quizzes. They're arguments in a different medium โ€” experiences designed to give you knowledge that prose alone can't deliver. You'll draw boundaries on continuous data and watch populations shift. You'll classify things that change in response to being classified. You'll try to build a system that's both reliable and valid, and discover that you can't have both. You'll encounter the same human experience through different cultural classification frameworks. You'll be on the receiving end of classification yourself. And finally, you'll try to replace a classification system while everything around it depends on it staying the same.


A word about what this essay is and isn't.

It isn't a polemic against the DSM. The manual has real accomplishments โ€” it brought a minimum of shared language to a field that badly needed one, and it's enabled research that would have been impossible without it. It also isn't a defense. The manual has real failures, and some of them have caused real harm.

What this essay is, I think, is an act of looking at something that's usually invisible. The classification system that organizes mental health care is one of the most consequential knowledge structures in modern life. It shapes how millions of people understand their own suffering, how billions of dollars flow through health systems, how research agendas are set, and how entire cultures think about the boundary between normal and disordered. And most of the people affected by it โ€” including many of the clinicians who use it daily โ€” have never had the opportunity to see it whole.

You're about to see it whole. It's more interesting, more contested, more fragile, and more fascinating than you probably think.

Let's look at the plumbing.

Section 2 โ€” A History of Sorting Suffering

People have always sorted suffering. The question has never been whether to classify mental distress but how โ€” and the history of the "how" turns out to be less a story of scientific progress than a story of institutional crises, political maneuvering, and the strange inertia of systems that too many other things depend on.

This section tells that story as what it is: a series of infrastructure versions, each one built on the wreckage and foundations of the last, each one carrying forward constraints its designers didn't fully choose and couldn't fully escape. If you've ever worked on a system migration โ€” swapping out a database while the application is running, upgrading a platform while users depend on the old one โ€” you'll recognize the pattern. The history of psychiatric classification is a version control problem played out over two centuries.

Before the Manual

For most of human history, the sorting of mental distress was local and unsystematic. Ancient physicians had their categories โ€” Hippocrates attributed madness to imbalances of bile and phlegm; medieval Islamic scholars like al-Balkhi distinguished disorders of the body from disorders of the soul โ€” but these frameworks didn't function as infrastructure in the modern sense. They weren't embedded in institutions. They didn't determine who received what kind of care, or whether care was provided at all. They were theories, not operating systems.

That changed with the asylum. Beginning in the late eighteenth century, as European states took on the project of institutionalizing the mentally ill, administrators needed to count the people in their care. Counting requires categories. The earliest institutional classifications were blunt instruments โ€” "mania," "melancholia," "dementia," "idiocy" โ€” designed not for clinical precision but for bureaucratic legibility. How many patients does this asylum hold? What kinds? How long do they stay? These questions came from governments and reform commissions, not from science, and the categories were built to answer them.

This matters because it establishes a pattern that will repeat throughout the history: classification serves the institution first and the patient second. The categories exist because someone with administrative power needs to sort people into groups. The scientific refinement comes later, layered on top of a structure whose shape was determined by institutional need.

Kraepelin's Wager

The figure who transformed psychiatric classification from a bureaucratic convenience into a scientific ambition was Emil Kraepelin, working in German university clinics in the 1880s and 1890s. Across eight editions of his textbook, Kraepelin built the most influential classification system in psychiatry's history โ€” and he did it by making a bet.

The bet was this: if you watched patients long enough, their illnesses would sort themselves. The key validators were not symptoms at a single point in time but course and outcome over years. Kraepelin's great dichotomy โ€” dementia praecox (what we now call schizophrenia) versus manic-depressive insanity (what we now call bipolar disorder) โ€” was grounded in the claim that these two conditions had fundamentally different trajectories. Dementia praecox deteriorated. Manic-depressive illness cycled but preserved function.

It was an elegant framework, and it carried an implicit philosophy: psychiatric disorders are natural entities, discoverable through careful observation, analogous to diseases in the rest of medicine. This philosophy would have enormous staying power. A century later, the architects of DSM-III would invoke Kraepelin's name and authority, claiming to return psychiatry to its empirical roots after decades of psychoanalytic drift.

But here's what the neo-Kraepelinians โ€” the movement to return psychiatry to Kraepelin's emphasis on observation, discrete categories, and biological etiology โ€” left out: Kraepelin himself was less certain than his followers. He revised his system compulsively across those eight editions, rearranging categories, collapsing distinctions, splitting groups. He acknowledged cases that didn't fit either pole of his great dichotomy. He was, as the historian Hannah Decker has documented, considerably more flexible and more troubled by the limitations of his own system than the people who later claimed his mantle.

This is the first instance of a pattern the essay will track: infrastructure inherits a simplified version of its designers' intentions. What Kraepelin built was provisional and self-revising. What his successors inherited was a rigid architecture treated as settled science.

The Psychodynamic Interlude: DSM-I and DSM-II

When the American Psychiatric Association published its first Diagnostic and Statistical Manual in 1952, Kraepelin's framework was the available foundation โ€” the installed base. But the manual's architects weren't Kraepelinian in spirit. American psychiatry in the mid-twentieth century was dominated by psychoanalysis and psychodynamic thinking, which had little use for rigid diagnostic categories. In the psychodynamic view, diagnosis was less important than the underlying dynamics โ€” the unconscious conflicts, the developmental arrests, the defensive structures that produced symptoms. Two patients with identical symptoms might have completely different underlying processes; the category told you almost nothing about the treatment.

DSM-I and DSM-II reflected this orientation. The categories were loose, the definitions vague, the boundaries fuzzy. "Neurosis" could cover an enormous range of human suffering. The manuals assumed that competent clinicians would use their training and judgment to understand each patient individually โ€” the manual was a rough guide, not a rulebook.

This worked, after a fashion, as long as the primary users of the classification were analytically trained psychiatrists talking to each other. The infrastructure was designed for a small, homogeneous community that shared assumptions, training, and interpretive frameworks. Two analysts might use different diagnostic labels for the same patient and not consider this a problem, because the label wasn't doing the real clinical work.

But infrastructure that works for a small, homogeneous community breaks when the community grows and diversifies. And in the 1960s and 1970s, several forces converged to break it.

The Crisis of Reliability

The first crack was empirical. A series of studies in the 1950s and 1960s demonstrated that psychiatric diagnosis was alarmingly unreliable โ€” different clinicians examining the same patient frequently reached different conclusions. The most famous of these studies came from Aaron Beck, who found that experienced psychiatrists agreed on a diagnosis only about 54% of the time. For a field claiming medical authority, this was damning. If two cardiologists looked at the same EKG and gave different readings more than half the time, cardiology would be in crisis. Psychiatry was in crisis, though it took a while to admit it.

The second crack was cultural. The anti-psychiatry movement of the 1960s and 1970s โ€” Thomas Szasz arguing that mental illness was a "myth," R.D. Laing suggesting that madness was a sane response to an insane world, Ken Kesey's One Flew Over the Cuckoo's Nest reaching millions โ€” attacked the legitimacy of psychiatric diagnosis from outside the profession. These weren't empirical critiques so much as political and philosophical ones, but they created a climate in which psychiatry's claims to expertise were under constant public suspicion.

The third crack was institutional. The insurance industry, which was increasingly being asked to pay for psychiatric treatment, wanted to know what it was paying for. Vague psychodynamic formulations didn't generate billable categories. Insurers needed discrete diagnoses with clear criteria โ€” something that could be coded, counted, and audited. This pressure would prove at least as important as any scientific consideration in shaping what came next.

And then there was Rosenhan.

The Experiment That Changed Everything (and May Not Have Happened)

In 1973, David Rosenhan published "On Being Sane in Insane Places" in Science, and it hit American psychiatry like a bomb. The study's design was simple and devastating: Rosenhan and seven associates presented themselves at psychiatric hospitals, reported hearing voices saying "empty," "hollow," and "thud," and were admitted. Once inside, they behaved normally. All were diagnosed with schizophrenia. None were detected as impostors by staff. The study's conclusion โ€” "we cannot distinguish the sane from the insane in psychiatric hospitals" โ€” made front pages and became a cultural reference point that psychiatry has never fully escaped.

Robert Spitzer, a psychiatrist at Columbia who would become the most consequential figure in the history of psychiatric classification, published a blistering response in 1975. Spitzer argued that Rosenhan's study proved nothing of the kind โ€” that diagnosing someone who reports hallucinations with a psychotic disorder isn't a failure of the diagnostic system but a reasonable clinical inference, just as an emergency room that admits someone reporting chest pain hasn't failed if the pain turns out to be heartburn. The problem, Spitzer insisted, wasn't that psychiatry diagnosed the pseudopatients. The problem was that psychiatric diagnosis lacked the explicit, operationalized criteria that would make it possible to diagnose better.

Decades later, the investigative journalist Susannah Cahalan would reveal that Rosenhan's study was likely fraudulent โ€” data fabricated, pseudopatients potentially invented, positive experiences at the hospitals systematically excluded from the published account. This is a remarkable twist, but for the history of classification it barely matters. What matters is what the study did โ€” the institutional crisis it created, the rhetorical ammunition it provided, and the opening it gave to reformers who were already waiting.

The Rosenhan affair is a parable about how infrastructure changes. Large systems don't get replaced because someone publishes a better blueprint. They get replaced because a crisis makes the status quo intolerable, and someone with a plan is ready. Rosenhan created the crisis. Spitzer had the plan.

The Revolution: DSM-III

What Robert Spitzer built between 1974 and 1980 was not just a new diagnostic manual. It was a new operating system for American psychiatry โ€” and, eventually, for global mental health care. DSM-III, published in 1980, was the most consequential event in the modern history of psychiatric classification, and its design choices are still the ones we live with today.

The core innovation was operational criteria. Where DSM-II had described disorders in loose, paragraph-length prose that clinicians could interpret freely, DSM-III specified each disorder as a list of explicit criteria: the patient must exhibit at least five of the following nine symptoms, lasting at least two weeks, with clinically significant distress or impairment. This made diagnosis, at least in principle, a procedural act โ€” check the boxes, count the symptoms, apply the threshold.

The philosophical move was equally radical: DSM-III declared itself "atheoretical" with respect to etiology. Where DSM-I and DSM-II had been organized around psychodynamic assumptions about causation, DSM-III would describe disorders purely by their observable features, making no claims about what caused them. This was presented as scientific modesty โ€” we don't know enough about causation, so let's classify by description โ€” but it was also political genius. By stripping out psychodynamic theory, Spitzer made the manual acceptable to the biological psychiatrists who wanted to study brain mechanisms, the behaviorists who wanted to study observable behavior, the researchers who wanted reliable patient populations for clinical trials, and the insurers who wanted billable categories. Everyone could use DSM-III because it didn't commit to anyone's theory.

The result was, in Bowker and Star's terms, a boundary object โ€” a shared artifact flexible enough to serve radically different communities. And like all successful boundary objects, it rapidly became infrastructure. Insurance billing codes were mapped to DSM-III categories. Research protocols were designed around DSM-III definitions. Pharmaceutical clinical trials selected patients using DSM-III criteria. Training programs taught residents to think in DSM-III categories. Legal proceedings adopted DSM-III language. Within a decade, the manual was so deeply embedded in so many institutional structures that replacing it โ€” even improving it โ€” became a problem of system migration rather than scientific revision.

This is the moment when psychiatric classification became infrastructure in the full technical sense. Before DSM-III, the diagnostic manual was a reference book that clinicians could take or leave. After DSM-III, it was load-bearing.

But DSM-III carried costs. The optimization for reliability โ€” for getting clinicians to agree โ€” may have come at the expense of validity. Making diagnostic criteria explicit and countable meant drawing hard categorical lines through what might be dimensional, continuous phenomena. It meant defining disorders by surface features rather than underlying mechanisms, which is a bit like classifying animals by color rather than by evolutionary lineage โ€” you'll get reliable sorting, but you might be grouping things that don't belong together and splitting things that do. And the atheoretical stance, however politically useful, papered over a genuine philosophical question: if you're not claiming these categories reflect natural kinds, then what are they? Are they useful fictions? Administrative conveniences? The best available approximation of something real? DSM-III didn't say, and the ambiguity has haunted every subsequent edition.

Spitzer himself was remarkably candid about the political dimensions of the enterprise. The process of building DSM-III involved hundreds of committee meetings, and the committees were stacked with neo-Kraepelinian allies. Controversial decisions โ€” the removal of "neurosis" as an organizing concept, which enraged the psychoanalytic establishment โ€” were fought as political battles, with votes, lobbying, and compromise language. Spitzer was brilliant at this. He was also, by most accounts, genuinely committed to improving psychiatric diagnosis. The two things weren't contradictory. Building infrastructure is always political, even when โ€” especially when โ€” the builders believe in what they're building.

The neo-Kraepelinian ideology embedded in DSM-III was articulated most clearly by Gerald Klerman in 1978: psychiatric disorders are discrete entities with clear boundaries between normal and pathological; the focus of psychiatry should be on the biological aspects of mental illness; diagnosis should be based on explicit criteria validated by empirical research. These propositions are not self-evident truths. They're philosophical commitments โ€” wagers about the nature of mental suffering โ€” that got built into the infrastructure and then, over time, became invisible. When clinicians trained after 1980 think in categorical diagnoses with explicit criteria, they're not making a philosophical choice. They're breathing the air.

After DSM-III: Versioning Under Constraints

Once DSM-III was installed, the subsequent history follows the logic of infrastructure maintenance rather than scientific revolution. Each revision โ€” DSM-III-R in 1987, DSM-IV in 1994, DSM-IV-TR in 2000, DSM-5 in 2013, DSM-5-TR in 2022 โ€” has operated under the same fundamental constraint: too much depends on the existing categories to change them radically.

DSM-IV, chaired by Allen Frances, was the most careful and conservative of the revisions. Frances's guiding principle was "first, do no harm" โ€” any change to a diagnostic category would ripple through research protocols, insurance policies, legal precedents, and patient identities, and those ripple effects needed to be weighed against whatever scientific improvement the change offered. This is infrastructure thinking, whether or not Frances used that language. The installed base constrains the upgrade.

DSM-5, published in 2013 after years of contentious development, attempted more ambitious reforms. Its architects wanted to introduce dimensional assessments alongside categorical diagnoses โ€” acknowledging that psychiatric phenomena exist on spectra rather than in neat boxes. They wanted to reorganize the manual around emerging neuroscience rather than descriptive tradition. They wanted, in essence, to begin migrating the architecture toward something that better reflected the science.

Most of these ambitions failed. The dimensional assessments were relegated to an appendix. The neuroscientific reorganization was largely cosmetic. The field trials โ€” meant to demonstrate that the new criteria were reliable โ€” produced kappa values that were, in many cases, worse than those DSM-III had achieved three decades earlier. (Kappa measures how often clinicians agree beyond what chance alone would predict. A kappa of 1.0 means perfect agreement; a kappa near 0 means clinicians agree no more often than if they were flipping coins. Many DSM-5 categories landed disturbingly close to the coin-flip end.) Allen Frances, the chair of DSM-IV, became DSM-5's most vocal public critic, arguing that it would pathologize normal grief, medicalize everyday forgetfulness, and expand diagnostic boundaries in ways that served pharmaceutical companies more than patients.

The DSM-5 episode illustrates the central infrastructure dilemma: the system needs updating, but the update process is constrained by everything that depends on the system staying stable. Research programs built on DSM-IV categories can't absorb radical category changes without invalidating years of accumulated data. Insurance systems can't process dimensional scores โ€” they need categorical codes. Clinicians trained on DSM-IV categories can't retrain overnight. The result is that each revision is a negotiation between what the science suggests and what the installed base can absorb, and the installed base usually wins.

What the History Teaches

Telling the history of psychiatric classification as a story of infrastructure rather than a story of science reveals several things that would otherwise be invisible.

First, it shows that the system's shape was determined as much by institutional pressures as by empirical evidence. The insurance industry's demand for billable categories did at least as much to produce DSM-III's architecture as any scientific finding. The pharmaceutical industry's need for discrete patient populations reinforced categorical thinking. Legal systems' need for clear determinations of competence and disability rewarded sharp diagnostic boundaries. The infrastructure was designed to serve multiple masters, and its form reflects the compromise.

Second, it reveals path dependency. Kraepelin's dichotomy shaped the Feighner criteria, which shaped the Research Diagnostic Criteria, which shaped DSM-III, which shaped everything after. Each version inherits the bones of its predecessor, not because the old structure is correct but because the cost of restructuring is prohibitive. This is why, forty-five years after DSM-III, the manual still organizes disorders into categories defined by symptom checklists with arbitrary numerical thresholds โ€” not because this is the best way to classify mental suffering, but because this is the installed base and everything else is built on top of it.

Third, it shows that the people who build classification systems matter โ€” their training, their intellectual commitments, their institutional positions, their personalities. DSM-III was not the inevitable product of scientific progress. It was the product of Robert Spitzer's specific combination of intelligence, political skill, and neo-Kraepelinian conviction. A different person in that chair, with different philosophical commitments and different allies, would have produced a different manual โ€” and the global infrastructure of mental health care would have a different shape today.

Finally, the history makes visible something that will matter increasingly as the essay continues: the gap between what the manual was designed to do and what it's now asked to do. DSM-III was designed to solve a reliability crisis in American psychiatric research. It was not designed to organize global mental health care, adjudicate disability claims, structure pharmaceutical development, shape individual identity, or serve as the foundation for neuroscientific research. It does all of these things now, not because anyone planned it but because that's what happens when infrastructure succeeds โ€” it gets extended, depended on, and burdened with purposes its designers never imagined.

The plumbing was installed for a building of a certain size. The building is now much larger. And the plumbing is starting to leak.

Section 3 โ€” The View from Inside the System

Before this essay goes any further into abstraction โ€” before we analyze the philosophy, trace the science, or map the politics โ€” we need to go into the room. The room where diagnosis happens. Because everything the rest of this essay will examine, all the infrastructure and ideology and institutional machinery described in the previous section, ultimately converges on a single encounter: one person trying to understand another person's suffering and deciding what to call it.

This section is about what that encounter feels like from both sides. What the clinician experiences when they press the irreducible complexity of a human being into a categorical box. What the patient experiences when a name is placed on their suffering โ€” and their life begins to reorganize around it. The gap between the manual and the room turns out to be enormous, and understanding that gap is essential for everything that follows. Without it, the rest of the essay would be an argument about plumbing made by someone who'd never seen water.


The Clinician's Side

Here is what the DSM assumes happens. A patient presents with complaints. The clinician conducts a structured or semi-structured interview, systematically assessing the patient's symptoms against the manual's operational criteria. If five of nine criteria for major depressive disorder are met, including at least one of the first two, and these symptoms represent a change from previous functioning and cause clinically significant distress or impairment, and they're not better explained by another condition or attributable to a substance โ€” then the diagnosis is made. The process is essentially algorithmic: inputs go in, a category comes out.

Here is what actually happens.

The anthropologist Tanya Luhrmann spent four years embedded in American psychiatric residency training programs, attending hundreds of lectures, rounds, admissions interviews, and on-call shifts. What she found was that young psychiatrists learn to diagnose not by consulting criteria but through a process that looks far more like cultural apprenticeship. They learn what schizophrenia "looks like" by seeing dozens of patients whom their supervisors call schizophrenic. They absorb a gestalt โ€” a pattern, a feel, a clinical intuition built from accumulated exposure โ€” and only afterward learn to articulate that intuition in the DSM's criterial language. The criteria function less as the engine of diagnosis than as its post-hoc justification. The clinician sees the pattern first; the checklist comes second.

This isn't a failure of training. It's consistent with how expert judgment works in every field that's been studied โ€” medicine, chess, firefighting, air traffic control. Experts don't follow decision trees. They recognize patterns. The DSM's operational criteria were designed for a model of clinical reasoning that doesn't match how clinical reasoning actually operates.

But the gap between the model and the reality creates a peculiar double life. Luhrmann documented what she called a "dual consciousness" among the residents she observed: privately, they maintained richer, more contextual, more psychologically nuanced understandings of their patients. Publicly โ€” in charts, in rounds, in insurance paperwork โ€” they spoke the DSM's categorical language. The official story and the clinical understanding diverged, and every working psychiatrist learned to inhabit both simultaneously.

Owen Whooley, a sociologist who studied the profession's relationship to its own uncertainty across two centuries, identified something similar: a systematic "management of ignorance" in which psychiatrists present diagnostic confidence to patients, families, insurers, and courts while privately acknowledging the profound uncertainty of their classificatory judgments. The DSM functions, in his analysis, as a "confidence prop" โ€” its operational criteria and categorical structure create an appearance of scientific precision that masks the underlying indeterminacy. This isn't hypocrisy. It's the structural demand of a profession that must act decisively under conditions of genuine epistemic uncertainty. Patients need answers. Insurers need codes. Courts need determinations. The clinician who said "I'm not sure what's wrong with you, and honestly the categories I'm working with may not carve reality at its joints" โ€” however epistemically honest โ€” would not last long in practice.

And then there's the institutional compression. As managed care tightened its grip during the 1990s and never loosened it, the diagnostic encounter was squeezed into smaller and smaller time windows. Inpatient stays shortened from weeks to days. Outpatient sessions shrank from fifty minutes to fifteen. The careful, exploratory assessment that the DSM's architecture assumes โ€” the systematic consideration of each criterion, the ruling out of differential diagnoses, the thoughtful weighing of clinical significance โ€” was hollowed out by time pressure. Diagnosis became, in many settings, less an exploration and more an administrative processing step: assign a code, justify a medication, document for discharge. What the manual envisions as a clinical act often functions, in practice, as a bureaucratic one.

Lorna Rhodes, an anthropologist who studied a psychiatric emergency unit, found the gap at its widest precisely where the stakes were highest. In the emergency setting, the primary clinical question wasn't "what disorder does this person have?" but "what do we do with this person?" Available beds, insurance status, patient demeanor, perceived dangerousness โ€” these shaped the diagnostic label far more than the DSM's operational criteria. When a bed was available, a patient was more likely to receive an admittable diagnosis. When beds were scarce, diagnostic thresholds shifted upward. The classification system didn't determine the clinical decision; the institutional context determined the classification.

None of this means clinicians are cynical or careless. The psychiatrists Luhrmann observed, the emergency workers Rhodes studied, the community mental health staff Paul Brodwin followed in his ethnography of frontline care โ€” all of them were trying, within real constraints, to help the people in front of them. And sometimes the infrastructure works exactly as designed โ€” the manic presentation that the manual catches and the clinician's experience confirms, the treatment decision that the criteria make legible and urgent. Bipolar disorder diagnosed promptly can mean lithium prescribed promptly, and lithium prescribed promptly can mean a life saved. The system has genuine achievements. They're just unevenly distributed across the manual's three hundred categories.

The problem isn't bad faith. The problem is that the infrastructure assumes a world that doesn't exist: a world of unlimited time, purely clinical decision-making, and categories precise enough to do the work asked of them. In the actual world โ€” the world of fifteen-minute medication checks, prior authorization phone calls, and waiting rooms that never empty โ€” the infrastructure bends. And the people working inside it bend with it, developing workarounds, strategic coding practices, and the pervasive dual consciousness of professionals who depend on a system they don't entirely believe in.

A global survey of nearly five thousand psychiatrists across forty-four countries confirmed the pattern at scale. Clinicians overwhelmingly rated classification systems as most useful for administrative purposes โ€” assigning a diagnosis, billing, communicating with other professionals โ€” and least useful for the things the DSM ostensibly exists to do: selecting treatments and determining prognosis. Two-thirds preferred flexible clinical descriptions that allow for judgment over the DSM's strict operational criteria. The majority wanted dramatically fewer categories than the three hundred the DSM-5 provides. The people who use the system every day have a clear message, delivered consistently across cultures and practice settings: the manual's granularity exceeds its usefulness, its rigidity exceeds its accuracy, and its primary value is institutional rather than clinical.

The medical chart makes the translation visible. Robert Barrett's ethnographic analysis of psychiatric documentation showed how the clinical record constructs the patient through systematic filtering. The patient's self-narrated experience โ€” messy, contextual, contradictory, deeply personal โ€” gets translated into DSM-compatible language. Countable symptoms are amplified. Subjective experience, social context, and personal meaning are attenuated or erased. What the chart produces is not a record of the person but a documentary artifact: the diagnostically legible patient, assembled from selected observations and organized according to classificatory logic. The lived person exceeds and resists this construction at every point. But the chart's version becomes the operationally real patient โ€” the one who moves through the healthcare system, triggers billing codes, and accumulates a history that subsequent clinicians will read and treat as ground truth.

This is what it feels like to classify: to know more than you can document, to understand more than the categories allow you to say, and to perform a confidence you don't entirely feel โ€” because the system requires confidence, the patient needs a name for their suffering, and the institution needs a code.


The Patient's Side

Now cross the desk.

Imagine you've been struggling. The specifics vary โ€” maybe it's a heaviness that won't lift, a dread that arrives without cause, a sense that the world has gone flat and gray and distant. Maybe it's thoughts that won't stop, racing at three in the morning, pulling you in directions that frighten you. Maybe it's voices, or the growing suspicion that your own thoughts don't quite belong to you. Whatever it is, it's been bad enough and long enough that you're sitting across from someone with the authority to name it.

The philosopher Matthew Ratcliffe spent years investigating what depression actually is as a lived experience โ€” not the DSM's checklist of depressed mood, loss of interest, sleep disturbance, and psychomotor changes, but the thing itself, as it is inhabited from the inside. What he found bears almost no resemblance to the manual. Depression, in Ratcliffe's phenomenological account, isn't primarily a mood. It's a transformation of the entire structure of experience. The world stops soliciting engagement. The future ceases to feel like a space of possibility and becomes a wall. Other people, previously experienced as accessible and present, become distant and unreachable โ€” not because they've moved but because the experiential bridge to them has collapsed. Time thickens. The body becomes heavy not in any physical sense but in the sense that every action requires an effort of will that the person barely possesses. It's not sadness. It's the loss of the capacity to care about anything enough for sadness to gain purchase.

Now compare that to the DSM-5's criteria: "Depressed mood most of the day, nearly every day, as indicated by either subjective report (e.g., feels sad, empty, hopeless) or observation made by others (e.g., appears tearful)." The gap between the phenomenological reality and the criterial description isn't a minor shortcoming. It's a category error โ€” a confusion between the experience and its surface indicators, between the world as transformed and the behaviors that transformation produces. What the DSM captures is real. But what it misses is arguably more clinically important: the structural change in how the person inhabits their life.

The phenomenological psychiatrists Josef Parnas and Louis Sass have made an analogous argument about schizophrenia. Their research program demonstrates that schizophrenia-spectrum conditions involve a distinctive disturbance of what they call ipseity โ€” the minimal sense of self, the pre-reflective feeling that your experiences are yours, that you are the subject of your own mental life. In schizophrenia, this basic selfhood becomes destabilized. Thoughts feel inserted or alien. Perception acquires a quality of unreality. There's a hyperreflexivity โ€” an excessive, detached self-monitoring โ€” combined with a diminished self-presence, as though the person is watching themselves from the outside while the inside has gone quiet. None of this appears in the DSM's criteria for schizophrenia, which focus on hallucinations, delusions, disorganized speech, and negative symptoms. The criterial description catches the surface. The experiential core โ€” the transformation of selfhood โ€” falls through.

This matters enormously for the person being diagnosed, because what they receive from the classification is a translation of their suffering into a language that may not recognize its most essential features. The patient with depression gets a checklist that describes their observable behavior while missing the experiential catastrophe. The patient with schizophrenia gets a symptom inventory that catalogs manifestations while overlooking the disturbance of self from which those manifestations arise. The diagnosis names something real, but the name doesn't quite fit, and the gap between the name and the experience is a space where people get lost.

And yet the name matters. It matters enormously, and in contradictory ways.

For some people, the moment of diagnosis is a moment of recognition. Finally, someone understands. The suffering has a name, which means it's real, which means it's known, which means it can be treated. The diagnosis provides an explanatory framework for experiences that were previously bewildering or shameful. It connects the person to a community of others with the same label, to online forums and support groups and memoirs and a shared vocabulary for what they're going through. The relief can be overwhelming. The name gives permission to stop blaming yourself, to understand your difficulties as a condition rather than a moral failing, to ask for help without apology.

For other people โ€” sometimes the same people, sometimes simultaneously โ€” the diagnosis is a closing door. The label becomes a ceiling on what you're expected to achieve, a permanent notation in your medical record that follows you from clinician to clinician, a social identity that others will use to interpret your every behavior. You're not having a reasonable reaction to an unreasonable situation; you're being symptomatic. You're not making a bold life choice; you're being manic. The diagnosis reframes the past โ€” retrospectively organizing every previous difficulty into a narrative of illness โ€” and forecloses the future, because now the question is always whether you're "stable," whether you're "in remission," whether you're "compliant" with treatment. What Patrick Corrigan and his colleagues have documented as the "why try" effect is the internalized version of this foreclosure: the person absorbs the label's implicit message about their limitations, reduces their expectations and effort, and fulfills the very prophecy the diagnosis seemed to predict.

The psychiatric survivor literature names this experience with a clarity that clinical language lacks. Judi Chamberlin, a former psychiatric patient, argued that the mental health system's classification of her experience was a misrepresentation that served institutional needs, not her own. Patricia Deegan, a psychologist who was herself diagnosed with schizophrenia, described recovery as a process of reclaiming agency after what she called the "catastrophe" of diagnosis โ€” catastrophe not because the symptoms were trivial but because the label reorganized her entire identity in ways she hadn't consented to and couldn't control. The Mad Studies literature, emerging from the psychiatric survivor movement, treats classification not as a neutral scientific act but as an exercise of power: the power to name another person's experience, to locate it within an expert framework, and to determine what that experience means and what should be done about it.

None of this means diagnosis is simply oppressive. It can be genuinely liberating. The point is that it's both โ€” and often both at once, for the same person, in ways that can't be resolved. The sociologist Susan Leigh Star and information scientist Geoffrey Bowker had a name for this: torque. Torque is the biographical tension experienced by people whose lives don't fit the classification categories available to them โ€” or, more precisely, whose lives fit those categories in some ways but not others, so that the classification simultaneously helps and harms, opens and closes, sees and blinds.

Torque is the person whose depression responds to antidepressants (the diagnosis was useful) but who now can't get life insurance without a loaded premium (the diagnosis follows you). Torque is the parent whose child gets an ADHD diagnosis that unlocks school accommodations (the label opens doors) but that also changes how every teacher interprets the child's behavior (the label becomes a lens). Torque is the patient whose trauma looks nothing like the PTSD criteria because it was chronic and relational rather than acute and event-based, and who must choose between an ill-fitting label that grants access to treatment and no label at all. Torque is universal in psychiatric classification because the categories are rigid and human lives are not, and the twist between the two is felt in the body of the person being classified.


The Gap Between the Manual and the Room

What emerges from both sides of the diagnostic encounter is a picture that should unsettle any simple narrative about what psychiatric classification does.

From the clinician's side: a system designed for algorithmic precision is used through pattern recognition and clinical intuition. A system designed as atheoretical is deployed through theory-laden practice. A system designed to drive treatment is valued primarily for administration. A system designed for careful criterial assessment is compressed into bureaucratic coding under time pressure. The people who use it most find it institutionally indispensable and clinically insufficient, and they have developed an elaborate repertoire of workarounds, strategic coding practices, and dual consciousness to bridge the gap between what the system demands and what the clinical reality allows.

From the patient's side: a system that names real suffering in ways that are simultaneously too broad and too narrow. Too broad because the categories group together people whose experiences and underlying processes may be wildly different. Too narrow because the criterial descriptions strip away the phenomenological richness โ€” the transformed world, the destabilized self, the reconfigured time and embodiment and possibility โ€” that constitutes the actual experience of the condition. The diagnosis provides recognition and access at the cost of flattening, foreclosure, and the ongoing torque of living inside a category that was built for institutional purposes rather than experiential accuracy.

The gap between the manual and the room is not a problem of implementation โ€” a failure to train clinicians properly or write criteria precisely enough. The ethnographic evidence suggests it arises from the fundamental nature of the classificatory task itself. You cannot reduce complex, contextual clinical judgment to an algorithmic procedure without losing something essential. You cannot translate the phenomenology of human suffering into operationalized checklists without flattening it. You cannot build a system that simultaneously serves researchers, clinicians, insurers, lawyers, and patients without creating torque in every one of those constituencies. The gap is structural. It's built into the infrastructure.

The German psychiatrist Thomas Fuchs has argued that phenomenology provides the descriptive foundation that any classification must rest on โ€” and that the DSM skipped this step entirely, going straight to behavioral criteria without adequate phenomenological groundwork. The system measures the symptoms without understanding the experience that generates them. It's as if cartographers drew maps based on satellite photographs alone, never walking the terrain, never asking the people who live there what the landscape means to them. The maps would be accurate in certain ways and profoundly misleading in others. They would be useful for some purposes and catastrophically inadequate for others. They would be infrastructure โ€” functional, embedded, depended upon โ€” but infrastructure built on an incomplete foundation.

This is the system you've inherited. Built under institutional pressure, maintained by institutional inertia, used every day by people who find it both indispensable and inadequate, experienced by patients as both a key and a cage. Its history, which you've now seen, explains how it got here. Its phenomenology, which you've now felt, explains why it persists: because the alternatives are not obvious, the switching costs are enormous, and the gap between what the system promises and what it delivers is wide enough to cause suffering but not wide enough to cause collapse.

In a moment, you're going to experience one dimension of this gap with your own hands. You're going to draw categorical boundaries โ€” the kind the DSM draws โ€” on continuous, multidimensional data representing human experiences. And you're going to discover something that every clinician knows in their bones but that the manual never says out loud: the lines are not found. They are drawn. And where you draw them changes everything.

๐ŸŽฎ

Game 1: The Boundary Problem

Draw categorical boundaries on continuous data โ€” spectra of mood, thought, behavior, and function โ€” and watch how different cuts produce different populations, different prevalence rates, different people who qualify for treatment.

Loading interactive experienceโ€ฆ

Section 4 โ€” What Kind of Thing Is a Diagnosis?

Part II: Why the System Looks the Way It Does

You've just drawn the lines yourself. In Game 1, you placed categorical boundaries on continuous data โ€” on spectra of mood, thought, behavior, and function that don't announce where one condition ends and another begins โ€” and you watched the consequences ripple. Different cuts produced different populations, different prevalence rates, different people who qualify for treatment and different people who don't. The experience was designed to be unsettling, and if it worked, a question should now be sitting in your chest: if the lines could have gone elsewhere, what are they for?

This is where the essay shifts registers. Part I established the system as you encounter it โ€” its history, its texture, its human weight. Part II asks why the system looks the way it does, and the first question is the most fundamental one: what kind of thing is a psychiatric diagnosis?

This is not a question most clinicians spend time on. You learn the categories in training, you apply them in practice, and the philosophical underpinning of what you're doing when you diagnose someone โ€” whether you're discovering something or constructing something, whether you're reading a map or drawing one โ€” stays comfortably in the background. It's a question for philosophers, if anyone. Except that the answer to it determines what kind of system we should be building, what standards we should hold it to, and what it means when the system fails. The philosophy isn't decorative. It's load-bearing.


The cleanest place to start is with chemistry. Gold is an element. It has an atomic number of 79. This is not a convention, not a decision, not a committee vote โ€” it's a fact about the structure of the universe. Every atom of gold, everywhere, has 79 protons, and that's what makes it gold rather than platinum or mercury. The category "gold" is what philosophers call a natural kind: a grouping that reflects a genuine division in nature, a joint at which reality actually carves. You don't invent natural kinds. You discover them.

For most of its modern history, psychiatry has implicitly operated on the hope that its categories are something like this โ€” that "schizophrenia" and "major depressive disorder" are natural kinds waiting to be fully understood, entities in nature whose boundaries exist independently of whoever happens to be looking for them. Kraepelin's great wager, as we saw in the history, was that careful observation would reveal discrete disease entities with distinct causes, courses, and outcomes. The entire biomedical research program โ€” the search for biomarkers, the genome-wide association studies, the neuroimaging protocols โ€” is built on some version of this hope. If schizophrenia is a natural kind, then somewhere in the brain or the genome there's a signature that defines it, and we just haven't found it yet. If it's not a natural kind, then the search is structured around the wrong question.

The philosopher Rachel Cooper put the problem precisely: if psychiatric disorders are natural kinds, we should be able to discover them the way chemists discovered elements. But the history of the DSM looks nothing like the history of the periodic table. Elements don't get added or removed by committee vote. The boundary between gold and platinum doesn't shift depending on which professional organization is running the process. The periodic table doesn't need a new edition every two decades because the previous one was politically untenable. The DSM does all of these things, which suggests that whatever psychiatric categories are, they're not natural kinds in the way gold is a natural kind.

But this observation, while damaging to the strongest version of psychiatric realism, doesn't settle the question. Because natural kinds come in degrees.


The philosopher John Duprรฉ pointed out decades ago that even in biology โ€” the science most often cited as a model for psychiatry โ€” natural kinds are messier than the gold example suggests. Biological species aren't defined by a single essential property the way chemical elements are defined by atomic number. There's no equivalent of "79 protons" for Homo sapiens. Species are defined by a cluster of properties โ€” genetic similarity, reproductive compatibility, ecological niche, morphological resemblance โ€” and these properties don't always converge neatly. The boundaries between closely related species can be genuinely fuzzy. Biologists have been arguing about how to define "species" for as long as there have been biologists, and they have not reached consensus.

Duprรฉ's conclusion was radical and, for our purposes, illuminating: he argued for what he called promiscuous realism โ€” the view that there are many equally legitimate ways to classify the same reality, depending on what you're trying to do. A classification of organisms by evolutionary lineage is real. A classification of organisms by ecological role is also real. They carve the world differently because they serve different purposes, and neither is more "natural" than the other. Both are constrained by reality โ€” you can't classify dolphins as fish and get away with it, at least not if you're doing evolutionary biology โ€” but both involve human decisions about what matters.

If even biological species are this messy, psychiatric disorders were never going to be simple. The question is whether they're closer to the messy-but-real end of the spectrum โ€” natural kinds in a weaker, biological sense, held together by clusters of causally related properties even if we can't identify a single essence โ€” or whether they're something else entirely.


The psychiatrist-philosopher Kenneth Kendler, who has done more than perhaps anyone to think carefully about this from inside the profession, has argued for a middle position. Psychiatric disorders, in Kendler's view, are mechanistic property cluster kinds โ€” held together not by a single essential feature but by a web of causal relationships between genetic, neural, psychological, and social processes. There's no single thing that is major depression in the way that 79 protons is gold. Instead, there's a cluster of things โ€” neurotransmitter dysregulation, negative cognitive patterns, stress sensitivity, social withdrawal, sleep disruption, genetic vulnerability โ€” that tend to co-occur because they're causally connected to each other. The cluster is real, in the sense that the causal connections are real. But the boundaries of the cluster are genuinely vague, and where you draw the diagnostic line around it is, at least partly, a human decision.

This is a sophisticated position, and it has real implications for how you think about classification. If disorders are mechanistic property clusters, then the search for single biomarkers is likely misguided โ€” there won't be one signature for depression, because depression isn't one thing. But the categories aren't arbitrary either, because the causal connections within the cluster constrain where reasonable boundaries can be drawn. You can't just group anything with anything. The terrain has a shape, even if it doesn't have neat borders.

Kendler's view maps naturally onto the infrastructure metaphor. When you build a road, you don't get to ignore the terrain โ€” the mountains, the rivers, the soil composition. The terrain constrains what you can build. But the terrain doesn't tell you where to build the road. That depends on where people need to go, what resources are available, what other roads already exist. The road is real. The terrain is real. But the relationship between them involves human purposes, human decisions, and human tradeoffs.

Psychiatric categories are roads, not maps. The question is whether we've been building them as if they were maps.


The philosopher Peter Zachar has pushed this line of thinking further than anyone. His position, which he calls the practical kinds view, holds that psychiatric categories are best understood not as discoveries about nature but as tools for doing specific jobs. A diagnosis of major depressive disorder is not a statement that the patient has a discrete natural entity inside them. It's a statement that grouping this person's suffering under this label is useful โ€” useful for clinical communication, for treatment planning, for research, for insurance billing, for self-understanding. The category's validity isn't measured by how well it mirrors some underlying reality but by how well it serves the purposes it's designed for.

This sounds, at first, like it might slide into relativism โ€” as if Zachar is saying the categories are made up and you can draw them however you like. He's not. The practical kinds view is constrained by reality in two ways. First, the categories have to work. A classification that groups together people with genuinely similar treatment responses is doing a better job than one that doesn't, regardless of whether the grouping reflects a "natural" boundary. Second, the categories have to withstand empirical scrutiny. If a diagnostic category consistently fails to predict course, treatment response, or biological correlates, that's a mark against it โ€” not because it failed to mirror nature, but because it failed to do its job.

What Zachar adds that the mechanistic property cluster view doesn't is an honest reckoning with purpose. A classification system built for clinical communication will look different from one built for biological research, which will look different from one built for insurance billing, which will look different from one built for helping patients understand their own suffering. These are different purposes, and they don't all point toward the same set of categories. The DSM tries to be one system that serves all of these purposes simultaneously, and the tensions that result โ€” the tensions you've been seeing throughout this essay โ€” are not signs of failure but of an impossible design specification.


There's a competing position that should be taken seriously, because it's the one that implicitly governs the definition question: Jerome Wakefield's harmful dysfunction analysis. Wakefield argues that a mental disorder is a condition that involves two things: a dysfunction โ€” a failure of some internal mechanism to perform its evolutionarily designed function โ€” and harm โ€” the dysfunction causes suffering or impairment that's considered negative in the person's social context.

The appeal of this account is that it seems to give you the best of both worlds. The dysfunction criterion anchors diagnosis in biology โ€” there's a real thing going wrong, not just a social judgment. The harm criterion acknowledges that biology alone can't determine what counts as a disorder โ€” a biological difference that doesn't cause suffering in a given context isn't a disorder, whatever the mechanism is doing. It's a framework that tries to be scientifically grounded without being reductively biological.

The problems, though, are serious. First, we don't actually know the evolutionary functions of most psychological mechanisms with enough specificity to determine when they're "dysfunctioning." What's the evolutionarily designed function of mood regulation? Of attention allocation? Of fear response? These are open empirical questions, and basing a definition of disorder on them means basing it on knowledge we don't yet have. Second, the analysis assumes that psychological mechanisms have discrete, specifiable functions โ€” that the mind is modular in the way that the heart or the kidney is modular. This is a contested empirical claim, not a settled fact. Third, as critics have pointed out, the dysfunction criterion ends up doing most of its work as a placeholder: we know there's probably something going wrong biologically in severe schizophrenia, and we know there's probably not anything dysfunctional about homosexuality, so the analysis gives us the answers we already had while appearing to derive them from a principled framework.

Wakefield's analysis matters for this essay not because it's right or wrong but because it reveals a tension that runs through every attempt to define psychiatric disorder: the desire to anchor classification in biology โ€” to have the categories be discovered rather than decided โ€” keeps running into the fact that the biological knowledge isn't there yet. The categories are being used now, by millions of people, in systems that demand them today. They can't wait for neuroscience to catch up.


Here's where the infrastructure metaphor earns its keep.

In infrastructure, you don't get to wait for perfect knowledge before building. You build with what you have, under the constraints you face, for the users you're serving. And then the thing you build gets embedded in other systems, and those systems grow up around it, and changing it becomes exponentially harder even as your knowledge improves. This is exactly what happened with the DSM. The categories were built under conditions of genuine ignorance about underlying mechanisms, using the best available tools (clinical observation, committee consensus, symptom checklists). They were installed into clinical practice, insurance systems, legal frameworks, and research programs. And now the science has moved โ€” dimensional models look more accurate than categorical ones, the boundaries between disorders appear to be fuzzier than the manual suggests, the biomarkers haven't materialized โ€” but changing the categories means disrupting everything that's been built on top of them.

The former NIMH director Steven Hyman named this process with surgical precision in 2010: reification. The DSM's categories were originally intended as provisional conventions โ€” the best available groupings for clinical and research purposes, not claims about the ultimate structure of mental illness. But once they were installed as infrastructure, they began to be treated as real entities. Researchers designed studies around DSM categories as if those categories reflected genuine natural divisions. Pharmaceutical companies developed drugs targeting DSM-defined conditions as if those conditions were discrete diseases. Patients and clinicians came to think of diagnoses as things you have โ€” "I have depression," "she has ADHD" โ€” as if the categories named objects in the world rather than pragmatic groupings in a manual.

Reification is what happens when infrastructure becomes invisible. You forget that someone built it, that it could have been built differently, that it reflects decisions rather than discoveries. You look through it and mistake what you see for unmediated reality. Hyman's argument โ€” coming from someone who ran the institution most responsible for psychiatric research โ€” was that this reification had become a genuine obstacle to scientific progress. Researchers were trapped inside DSM categories, unable to ask whether the categories themselves were carving nature at its joints, because the categories had become the unquestioned starting point for all investigation.

Zachar's practical kinds view is, in one sense, a prescription against reification. If you understand diagnostic categories as tools rather than discoveries, you maintain the conceptual flexibility to revise or replace them when better tools become available. But there's a bitter irony here: the very act of using a tool โ€” of embedding it in practice, building institutions around it, training people in it โ€” tends to transform it from a tool you consciously use into an environment you unconsciously inhabit. Infrastructure resists its own revision. Categories resist being seen as categories.


So what kind of thing is a diagnosis?

The honest answer, after reviewing the philosophical landscape, is that it's a working fiction constrained by reality โ€” a practical tool whose boundaries don't mirror nature's joints but aren't arbitrary either. It's more like a road than a map: shaped by human purposes and built by human hands, but constrained by a terrain it didn't create and can't ignore. It's real enough to do work. It's provisional enough that it should be held lightly. And it's embedded deeply enough in institutional infrastructure that it's very, very hard to hold lightly.

This section has treated the philosophical question somewhat abstractly, as a debate among positions. But there's a dimension of the problem we haven't touched yet โ€” one that makes it fundamentally different from classification in any other science. The road-and-terrain metaphor works up to a point, but it has a flaw: it assumes the terrain stays still while you build on it.

In psychiatric classification, it doesn't. The terrain moves. The people being classified are aware of their classification, and their awareness changes their behavior, which changes the thing the classification is trying to capture, which destabilizes the classification, which provokes revision, which changes behavior again. The philosopher Ian Hacking called this process looping, and it's the strangest and most consequential property of psychiatric infrastructure.

It's also something you can't fully understand by reading about it. That's why, after the next section explains the loop, you'll experience it yourself.

Section 5 โ€” The Looping Architecture

In 1886, a gas company worker in Bordeaux named Albert Dadas walked into the hospital of Saint-Andrรฉ and presented his physician with a problem that didn't have a name yet. Dadas walked. Compulsively, aimlessly, across enormous distances โ€” to Paris, to Moscow, to Constantinople, to Algeria โ€” arriving with no money, no memory of the journey, and no idea why he'd gone. He didn't choose these trips. Something drove him out the door and across borders, and when he came to himself, dazed and far from home, he could only report the blankness where the journey should have been.

His physician, Philippe Tissiรฉ, gave the condition a name: fugue. And with the name, something remarkable began to happen.

Cases multiplied. Within a decade, fugue was a recognized diagnostic entity in French psychiatry, with its own clinical literature, its own case reports, its own professional debates. Patients arrived at hospitals across Europe presenting with precisely the condition Tissiรฉ had described โ€” compulsive wandering, amnesia, inability to explain. The category existed, and people began to fill it.

Then the conditions changed. The medical taxonomy shifted; hysteria โ€” the broader diagnostic framework within which fugue had carved its niche โ€” fell out of favor. Competing categories absorbed some of the clinical territory. The cultural context that made aimless wandering legible as illness rather than vagrancy or criminality evolved. The institutional infrastructure that had supported the diagnosis โ€” the clinics, the case report journals, the professional interest โ€” moved on to other things. And fugue, which had appeared as if from nowhere in the 1880s, vanished. Not gradually. It simply stopped. The people who would have been diagnosed with fugue in 1895 presumably still suffered โ€” still felt whatever impulse or distress had driven the wandering โ€” but the particular form that suffering had taken dissolved when the conditions supporting it dissolved.

The philosopher Ian Hacking, who excavated this story in his 1998 book Mad Travelers, introduced a concept to explain what had happened: the ecological niche. Fugue, Hacking argued, didn't arise because someone discovered a pre-existing disease. It arose because a specific set of conditions โ€” medical, cultural, legal, institutional โ€” created a space in which a particular kind of deviance could flourish. There had to be a medical taxonomy with room for the condition, a cultural context in which aimless wandering could be read as illness rather than moral failure, competing diagnoses that gave fugue a distinct identity, and institutional recognition that made it clinically real. When those conditions held, fugue was real โ€” as real as anything in medicine, generating genuine suffering, genuine treatment, genuine clinical encounters. When the conditions changed, the niche closed, and the illness vanished with it.

Notice what Hacking is not saying. He's not saying fugue was fake. He's not saying Dadas was malingering, or that his physicians were deluded, or that the dozens of subsequent patients were performing. He's saying something more interesting and more unsettling: that the diagnostic category didn't just describe a pre-existing form of suffering. It helped create the conditions under which that form of suffering could exist. The classification was part of the phenomenon it classified.

This is the property of psychiatric infrastructure that the previous section's road-and-terrain metaphor couldn't quite capture. We said that psychiatric categories are roads, not maps โ€” shaped by human purposes, built by human hands, constrained by a terrain they didn't create. But we noted the metaphor's flaw: it assumes the terrain stays still. In the case of fugue, the terrain didn't stay still. The road changed the terrain. The terrain changed the road. And the cycle continued until the whole landscape shifted and the road led nowhere.

Hacking called this process looping, and he spent the better part of three decades working out its logic. It is, I think, the single most important idea for understanding why psychiatric classification behaves the way it does โ€” and why building better psychiatric infrastructure is so much harder than building better plumbing.


Here is the loop, stated as a mechanism.

A classification is introduced. "The hyperactive child." "The depressive." "The person with borderline personality." The classification enters the world โ€” published in a manual, taught in training programs, deployed in clinics, circulated in media, picked up by patients and families and advocacy groups and insurance companies.

The people classified become aware of the classification. Not always directly โ€” a young child diagnosed with ADHD doesn't read the DSM. But they encounter the classification's effects: the label applied by a doctor, the medication prescribed, the school accommodations granted or denied, the way adults now talk about their behavior, the other children who share the label. The classification reaches them through the infrastructure built around it.

Their awareness changes them. This is the crucial step, the one that makes psychiatric classification fundamentally different from classifying chemical elements or geological strata. People respond to being classified. They may conform to the category โ€” learning to interpret their experience through its lens, adopting its language, performing its expected behaviors. They may resist โ€” rejecting the label, refusing treatment, defining themselves against the diagnosis. They may negotiate โ€” accepting parts of the classification while contesting others, building communities around shared diagnostic identity, transforming the meaning of the label into something its creators didn't intend. In every case, the classified people are no longer the same people the original classification described.

The classification no longer fits. The people have moved. Their behavior, self-understanding, symptom presentation, and social identity have all shifted in response to being classified. The original description, which was calibrated to people who hadn't yet been described, is now describing people who have been reshaped by the act of description.

The classification is revised. New criteria, new editions, new clinical consensus about what the category really means. DSM-III becomes DSM-III-R becomes DSM-IV becomes DSM-5. Each revision acknowledges, implicitly, that the previous version no longer matches what clinicians are seeing โ€” without acknowledging that what clinicians are seeing has been partly produced by the previous version.

The revised classification produces new changes. And the loop continues.

Hacking called the people caught in this cycle "moving targets." They are never stable objects that a fixed classification can capture, because the act of classification is itself a causal intervention in their lives. Every snapshot is already out of date by the time it's developed, because taking the snapshot changed the scene.

Translate this into infrastructure language and the implications become vivid. The infrastructure intervenes in the traffic it carries. The traffic adapts. The infrastructure no longer fits the traffic. It's revised. The traffic adapts again. You're building roads on a landscape that reshapes itself in response to the roads you build โ€” and you can never step back far enough to see the terrain as it was before you started building.


Fugue is a clean case โ€” historically complete, emotionally distant, pedagogically useful โ€” but it's also an extreme one. The niche opened and closed entirely, making the full life cycle of a looping effect visible. Most psychiatric categories don't behave this dramatically. Schizophrenia has persisted across classification systems, cultures, and historical periods. Severe depression has recognizable counterparts in medical literature stretching back millennia. Whatever looping effects operate on these conditions, they don't produce the wholesale appearance-and-disappearance pattern that makes fugue so striking.

This is why the loop needs to be understood as a spectrum, not a switch.

At one end: conditions where the classification is nearly constitutive, where the diagnostic category creates most of the conditions under which the phenomenon exists in its recognizable form. Fugue. Multiple personality disorder โ€” the epidemic of which, as Hacking documented in Rewriting the Soul, was intimately entangled with the clinical practices, media narratives, and recovered memory therapies that gave the diagnosis its cultural scaffolding. When the scaffolding shifted, the epidemic subsided. These are maximally interactive categories.

At the other end: conditions with substantial biological substrates that proceed largely regardless of classification. The neurochemical processes implicated in severe psychosis don't care whether the person has been diagnosed. The genetic architecture contributing to bipolar disorder doesn't rearrange itself when someone reads the DSM. There are components of these conditions โ€” Hacking called them "indifferent" components โ€” that don't loop. The biology sits underneath, doing what biology does, unperturbed by the names humans give it.

But even for these conditions, the experience loops. A person diagnosed with schizophrenia in 1960 inhabited a different illness than a person diagnosed with schizophrenia in 2025 โ€” not because the underlying neurobiology changed, but because the meaning of the diagnosis, the available treatments, the social consequences, the self-understanding it made possible, the communities organized around it, the legal and institutional frameworks surrounding it โ€” all of these shifted across those decades, and all of them shaped how the condition was lived. The biology may be indifferent to classification. The person never is.

Most of what the DSM classifies falls somewhere in the middle of this spectrum. Depression, ADHD, PTSD, generalized anxiety, eating disorders โ€” these are conditions with some biological grounding and significant interactive properties. The loop operates on them powerfully but not totally. The classification captures something real about the suffering while simultaneously shaping how the suffering is experienced, expressed, and understood. They are not made up. They are not simply found. They are made real through a process in which classification, experience, and culture are entangled in ways that can't be cleanly separated.

The philosopher of science would say: these categories are not unreal. They're differently real. Their reality is partially constituted by the infrastructure that identifies, names, and manages them. This doesn't make them less real than gold or granite. It makes them real in the way that money is real, or marriage is real, or national borders are real โ€” human constructions that are nonetheless as solid and consequential as anything in nature, precisely because human beings organize their lives around them.


Consider what this looks like up close, in a case that probably touches your life or someone's you know.

ADHD โ€” attention-deficit/hyperactivity disorder โ€” has been a recognized diagnostic category since the 1980s, though its predecessors stretch back further: hyperkinetic reaction of childhood in DSM-II, attention deficit disorder (with or without hyperactivity) in DSM-III. The category has been revised, renamed, and redrawn with every edition of the manual. And with every revision, the people it describes have changed.

When ADHD was primarily conceptualized as a childhood behavioral problem โ€” the kid who can't sit still in class โ€” it was understood, treated, and experienced as something you grew out of. Parents waited for their children to mature. Clinicians expected the condition to remit in adolescence. And the classified people, absorbing these expectations, organized their self-understanding accordingly: this is a phase, a developmental delay, a problem of childhood that adult life will resolve.

Then the category expanded. Adult ADHD gained recognition. Diagnostic criteria were revised to capture inattentive presentations, not just hyperactive ones. Prevalence estimates climbed. Pharmaceutical marketing โ€” itself a looping agent of extraordinary power โ€” saturated media with messaging about undiagnosed adult ADHD. And the people began to change.

Adults who had struggled for years with disorganization, procrastination, impulsivity, and the chronic sense that they were failing to live up to their potential encountered a category that offered an explanation. The diagnosis reorganized their autobiography. Those years of underperformance weren't moral failure โ€” they were an undiagnosed condition. The shame they'd carried could be reframed as a medical reality they hadn't known about. Online communities formed around shared diagnostic identity. The language of ADHD โ€” executive dysfunction, dopamine, hyperfocus, rejection sensitive dysphoria (a term that doesn't appear in the DSM but has spread through patient communities with the force of clinical gospel) โ€” became a vocabulary for self-understanding that millions of people adopted and adapted.

This is looping in real time, operating simultaneously at multiple scales. At the individual level: a person receives a diagnosis, and the diagnosis reorganizes their self-narrative, their behavior, their relationships, their expectations for themselves. At the clinical level: as more adults present with self-identified ADHD symptoms, clinicians' pattern recognition shifts; the clinical prototype evolves; the boundary between ADHD and normal variation in attention and executive function becomes a live, contested question in every consulting room. At the institutional level: schools develop accommodation frameworks calibrated to the diagnostic category; pharmaceutical companies develop and market medications for the expanding population; insurance systems build reimbursement structures around the diagnosis. At the cultural level: ADHD becomes an identity, a community, a lens through which a significant segment of the population understands its own cognitive style โ€” and the existence of that identity and community feeds back into clinical presentation, diagnostic rates, and the pressure on each subsequent DSM revision.

None of this means ADHD is not real. The attentional and executive differences that the category captures are measurable, heritable, and consequential. The suffering is genuine. The medications, for many people, work. But the category โ€” the specific way those differences are named, bounded, understood, and institutionalized โ€” is interactive. It shapes the phenomenon it classifies. The people it describes are partly produced by the act of description. The infrastructure is alive.


This is the deepest expression of the infrastructure metaphor that has been running through this essay, and it's the point where the metaphor stops being a metaphor and becomes a literal description. Psychiatric classification doesn't just carry traffic. It shapes what kinds of traffic are possible. The diagnostic category creates the institutional, cultural, and experiential conditions under which a particular way of being ill becomes available โ€” the medications that get developed, the accommodations that get offered, the identities that become livable, the research questions that get funded, the self-help books that get written, the communities that form online. Remove the category and you don't just lose a label. You lose an entire ecology of meaning and practice that the label made possible.

This is what Hacking meant by the ecological niche, and it's what Bowker and Star meant when they argued that classification systems don't passively sort a pre-existing world โ€” they actively constitute the world they appear to be sorting. The DSM doesn't describe disorders the way a field guide describes birds. It creates the conditions under which particular forms of suffering become recognizable, treatable, insurable, livable, and real. Some of these forms of suffering would exist without the DSM โ€” people would still hear voices, still lose the capacity for joy, still find their attention ungovernable. But the specific shape these experiences take, the specific way they organize lives and institutions and identities, is inseparable from the classificatory infrastructure that names and manages them.

This has a consequence that should make anyone interested in improving psychiatric classification deeply uncomfortable. It means that every revision of the system isn't just a scientific update โ€” it's an intervention in the lives of the people the system classifies. Change the criteria for ADHD and you change who gets diagnosed, which changes who gets treated, which changes who develops an ADHD identity, which changes the clinical presentation that the next revision will try to capture. Abolish a category and you don't just correct a scientific error โ€” you pull the ground out from under people who have built their self-understanding on it. Add a category and you don't just recognize a previously invisible condition โ€” you create a new niche, a new set of institutional and experiential possibilities, a new way of being a person.

The DSM's authors have always understood revision as a scientific process: updating categories to better match empirical evidence. Looping effects suggest it's something more like a political and existential process: reorganizing the lived realities of millions of people, with feedback dynamics that make the outcomes unpredictable. This is not a reason to stop revising. Categories that cause harm should be changed. Categories that miss important clinical distinctions should be refined. But the revision should be undertaken with full awareness that you're not just editing a reference manual. You're intervening in a feedback system, and the system will respond.


There's one more thing the loop reveals, and it connects back to the philosophical argument of the previous section.

We asked what kind of thing a diagnosis is and arrived at an answer: a practical tool, shaped by human purposes, constrained by a terrain it didn't create. Roads, not maps. That answer was honest and useful, but it was also incomplete, because it left the terrain inert. The looping architecture adds the missing dimension: the terrain is responsive. It shifts under the roads. And the roads, in turn, are constantly being rebuilt to accommodate the shifting terrain, which shifts again in response to the rebuilding.

This means the debate between realism and constructionism โ€” the argument over whether psychiatric categories are discoveries or inventions โ€” is the wrong debate. It assumes a stable dichotomy between what's "really there" and what's "just classification." But in an interactive system, the distinction dissolves. The category is constructed. The suffering is real. And the construction partly constitutes the reality. They can't be pulled apart because they've grown together, the way a river and its banks shape each other over centuries until you can't say which one determined the other's course.

Hacking himself arrived at this position, carefully and over many years. He called it a form of realism โ€” not naรฏve realism, which holds that our categories mirror a mind-independent world, but a realism sophisticated enough to acknowledge that in the domain of human suffering, the categories and the reality co-constitute each other. Not unreal. Not simply real. Differently real โ€” real in the way that interactive, reflexive, historically embedded human phenomena are real. Which is to say: as real as anything gets in the human sciences, and more complicated than almost anyone wants it to be.


This section has tried to explain the loop. But the loop, by its nature, resists explanation. It's a dynamic, feedback-driven process โ€” a system that behaves differently depending on what you do to it and differently again depending on how you observe it. Reading about it produces understanding. Living inside it produces something else: a felt sense of how classification and behavior chase each other in real time, how the ground shifts under your feet, how the confident categories you started with dissolve into moving targets.

That felt sense is what the next experience is designed to produce. You're about to classify things that change in response to being classified โ€” entities that watch you sorting them and reorganize themselves accordingly, destabilizing whatever system you've built. You'll revise your categories. They'll adapt again. You'll reach for stable ground and find it shifting.

It's the loop, made playable. And it will teach you something that this section, for all its words, cannot.

๐ŸŽฎ

Game 2: Looping

Sort entities into categories โ€” and watch them change in response to being classified. Your carefully constructed categories destabilize beneath you, not because they were wrong, but because the act of classification altered the thing being classified.

Loading interactive experienceโ€ฆ

Section 6 โ€” The Biological Promise

You've just felt the loop. In Game 2, you classified entities that changed in response to being classified, and you watched your own carefully constructed categories destabilize beneath you โ€” not because they were wrong at the moment you made them, but because the act of classification altered the thing being classified, which meant the categories no longer fit, which meant revision, which meant further alteration, which meant the ground never stopped moving. It was, if the game did its job, both intellectually clarifying and viscerally frustrating.

Now imagine you're a neuroscientist watching this unfold across all of psychiatry. The philosophical problems of the last two sections โ€” the fact that psychiatric categories aren't natural kinds, the fact that they loop and destabilize โ€” seem to point in a single, tantalizing direction: downward. Into the brain. Into the genome. Into the hard circuitry beneath the messy phenomenology. Because whatever else is uncertain about psychiatric suffering, one thing seems indisputable: it happens in brains. Brains are biological organs. Biology is measurable, quantifiable, and โ€” critically โ€” it doesn't know it's being measured. A neurotransmitter doesn't change its behavior because you've given it a name. A gene doesn't loop.

If you could ground psychiatric classification in biology rather than symptoms, you'd escape the whole tangle. No more committee votes about where to draw boundaries. No more reification of provisional categories. No more looping effects. Just circuits, molecules, and measurable dysfunction โ€” the kind of thing real medicine is built on.

This was the biological promise. It was the most ambitious attempt in the history of psychiatry to replace the existing infrastructure with something fundamentally better. And the story of what happened โ€” what was delivered, what wasn't, and why โ€” is one of the most revealing chapters in the essay's larger argument about what classification systems are and what they can do.


The promise didn't arrive all at once. It accumulated across decades, gathering institutional weight until it felt less like a hypothesis and more like an inevitability.

In 1990, President George H.W. Bush signed a proclamation declaring the 1990s the "Decade of the Brain," directing federal agencies to enhance public awareness of the benefits of brain research. The Human Genome Project launched that same year. The convergence seemed providential: just as psychiatry was completing its turn toward biological thinking โ€” the turn that DSM-III had initiated by privileging observable symptoms over psychodynamic speculation โ€” the tools to look beneath those symptoms were arriving. Neuroimaging was becoming more powerful. Molecular genetics was becoming more tractable. The old dream that Kraepelin had wagered on a century earlier โ€” that careful observation would reveal discrete disease entities with distinct biological signatures โ€” suddenly seemed achievable, not through clinical observation but through technological firepower.

In 1998, the Nobel laureate Eric Kandel published a manifesto in the American Journal of Psychiatry titled "A New Intellectual Framework for Psychiatry." His argument was sweeping: all mental processes are brain processes. Genes and environment interact to shape neural circuits. Psychotherapy, when it works, works by changing the brain. Kandel laid out five principles for the future of psychiatry, and all five were biological. The framework wasn't hostile to psychology or social context โ€” Kandel acknowledged their importance โ€” but the direction was unmistakable. The future of understanding mental illness lay in understanding its neurobiology. Everything else was upstream of the real action.

Kandel's manifesto had the force of authority behind it. He was not a psychiatrist speculating about neuroscience; he was a neuroscientist who had won the Nobel Prize for discoveries about the molecular basis of memory. When he said the brain was the key, the profession listened. And the research infrastructure โ€” the funding priorities, the training programs, the career incentives โ€” began to shift accordingly.

The biomarker search intensified. If depression was a brain disease, it should have a brain signature โ€” a pattern in neuroimaging, a genetic variant, a molecular marker โ€” that could be identified, measured, and used for diagnosis the way a blood test identifies diabetes. The same logic applied to schizophrenia, to anxiety disorders, to ADHD. The categories already existed in the DSM. The job was to find their biological underpinnings.

The serotonin hypothesis of depression became the public-facing emblem of this program. SSRIs โ€” selective serotonin reuptake inhibitors โ€” increased serotonin availability in the brain and appeared to alleviate depressive symptoms in many patients. The inference seemed straightforward: if boosting serotonin helps depression, then depression must involve a serotonin deficit. The pharmaceutical industry marketed this story aggressively, and it entered popular understanding as established fact. Depression was a chemical imbalance. It was biological. It was treatable. The infrastructure of explanation โ€” from drug advertisements to patient pamphlets to clinical conversations โ€” organized itself around this claim.

But the inference ran backwards. The mechanism of the drug was being used to infer the mechanism of the disease, which is roughly equivalent to concluding that headaches are caused by aspirin deficiency because aspirin relieves headaches. The historian David Healy documented how this pharmacological bridge โ€” drug mechanism becomes disease explanation โ€” didn't just describe an existing understanding but actively constructed one. The modern concept of depression as a common, primarily biological condition wasn't established first and then treated with drugs. The drugs came first, and the disease concept was reshaped to fit. When Joanna Moncrieff and colleagues published their systematic umbrella review in Molecular Psychiatry in 2022, examining the full body of evidence for the serotonin hypothesis, they found no consistent support for it. The story that had organized billions of dollars of treatment and shaped millions of patients' self-understanding turned out to rest on an inferential error that the field had allowed to harden into received wisdom.

This didn't mean SSRIs were useless โ€” many patients experienced genuine benefit. But it did mean that the biological narrative underwriting their use was far less solid than it appeared. The infrastructure of explanation had outrun the infrastructure of evidence.


The deepest and most institutionally consequential expression of the biological promise wasn't the serotonin hypothesis. It was RDoC.

In 2010, Thomas Insel โ€” then director of the National Institute of Mental Health, the largest funder of psychiatric research in the world โ€” published a four-page manifesto in the American Journal of Psychiatry. The paper, co-authored with several colleagues, bore a title that announced a paradigm shift: "Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders."

The argument was direct. DSM categories don't map onto biological systems. Research organized around DSM diagnoses โ€” studying "depression" as if it were a single entity, searching for "the genes for schizophrenia" as if the category named one thing โ€” will never find the mechanisms of mental illness, because the categories are drawing the wrong boundaries. RDoC proposed to abandon diagnostic categories as organizing principles for research and replace them with dimensional constructs โ€” threat processing, reward learning, working memory, social communication โ€” studied across multiple units of analysis: genes, molecules, circuits, physiology, behavior, self-report.

The framework was organized as a matrix. Six domains โ€” Negative Valence Systems (fear, anxiety, and loss), Positive Valence Systems (reward, motivation, and pleasure), Cognitive Systems (attention, memory, and executive function), Social Processes (attachment, communication, and the perception of others), Arousal and Regulatory Systems (sleep, energy, and stress response), and Sensorimotor Systems (motor behavior and agency) โ€” each containing multiple constructs, crossed with units of analysis ranging from the molecular to the behavioral. It was, in infrastructure terms, an architectural blueprint for an entirely new system. Not a renovation of the existing building but a new foundation, designed on different principles, intended to eventually support a different structure.

Crucially, RDoC was framed as a research framework, not a clinical replacement for the DSM. Patients would still be diagnosed using DSM categories. Insurers would still use DSM codes. But the research that would someday generate better categories would no longer be constrained by the old ones. This was a strategic ambiguity โ€” the new infrastructure wouldn't need to connect to the existing institutional dependencies immediately, because it was operating in the research domain where the rules were different. Build the knowledge base first. The clinical applications would follow.

The institutional machinery behind RDoC was formidable. Insel didn't just publish a paper; he redirected funding. NIMH grant applications that used DSM categories as their organizing framework were, in effect, penalized in peer review. Applications that recruited across diagnostic categories, that measured dimensional constructs, that organized around biology rather than symptoms, were rewarded. When you control the largest psychiatric research funding stream in the world, your framework preferences become the field's research priorities. RDoC didn't need to convince every scientist. It just needed to set the terms for what got funded.


Fifteen years later, the results are in. And they're complicated.

RDoC's most significant achievement may be cultural rather than empirical. The word "transdiagnostic" has exploded in the psychiatric literature since RDoC's launch. Before RDoC, a grant application that recruited participants across diagnostic categories was penalized. After RDoC, it became encouraged, and in some programs, required. This shift โ€” from studying "depression" and "schizophrenia" as if they were discrete entities to studying processes like reward learning and threat reactivity that cut across diagnostic boundaries โ€” represents a genuine change in how the field thinks about mental illness. Few serious neuroscientists now believe that DSM categories carve nature at its joints. RDoC made that skepticism not just intellectually respectable but institutionally incentivized.

The framework has also produced a handful of concrete translational achievements. The most impressive is the Fast-Fail Trials program, which reorganized early-phase drug development around RDoC constructs rather than DSM categories. Instead of testing whether a drug treats "depression," the program tested whether a drug targeting a specific brain receptor involved in pleasure processing (a kappa-opioid receptor antagonist) could address anhedonia โ€” the inability to experience pleasure โ€” as a transdiagnostic dimensional construct in RDoC's Positive Valence Systems domain. Patients were recruited based on anhedonia scores, not DSM diagnoses. The results, published in Nature Medicine in 2020, showed that the drug increased reward-related brain activity and reduced anhedonic symptoms. A related compound subsequently moved into clinical trials. This pipeline โ€” from RDoC construct to dimensional recruitment to targeted drug trial โ€” represents the most direct clinical translation the framework has achieved.

Meanwhile, the Bipolar-Schizophrenia Network on Intermediate Phenotypes consortium, working with RDoC-compatible methods, identified three biologically distinct "psychosis biotypes" that cut across the traditional schizophrenia-schizoaffective-bipolar boundary. These biotypes were defined by brain-based biomarkers โ€” EEG patterns, cognitive tasks, eye-tracking performance โ€” rather than by symptoms. It remains one of the clearest demonstrations that carving psychopathology by biology rather than by clinical presentation yields different, and potentially more useful, groupings.

These are real accomplishments. They're also, by the standards of the original ambition, modest.


Here is what RDoC has not produced: a single diagnostic biomarker that has entered routine clinical practice. No blood test. No brain scan. No genetic panel. The clinician Insel envisioned in 2010 โ€” one who would supplement a clinical interview with data from functional neuroimaging โ€” does not exist. Fifteen years and billions of research dollars later, psychiatric diagnosis still happens the way it happened before RDoC: a clinician talks to a patient, forms a clinical impression, and matches it to a category in a manual. (Fifteen years is a short timeline for basic science to produce clinical translation โ€” in oncology or cardiology, the gap between foundational research and bedside application routinely spans thirty to fifty years. But the promises made for biological psychiatry were not pitched in those modest terms, and the field's own advocates set the expectations the field is now judged against.)

This is not merely a delay. It reflects something structural.

An umbrella review of 162 peripheral biomarkers for major mental disorders, published in Translational Psychiatry in 2020, found that the vast majority of biomarker studies were substantially underpowered โ€” the true effect sizes of most non-genetic biomarkers are small, similarly to what has been found in the genetic literature. The biomarker search hasn't stalled for lack of effort. It has stalled because the biology is more complex than the framework anticipated.

Three problems, identified early by critics and confirmed by subsequent research, explain why.

The first is equifinality: many different biological pathways can produce the same clinical presentation. If dozens of distinct circuit-level disruptions can all produce what we call "depression," then organizing research by circuits doesn't reduce diagnostic heterogeneity โ€” it fragments it into an even larger number of subtypes, each biologically distinct, without clear clinical boundaries. The psychosis biotypes are a proof of concept, but they haven't yet been shown to predict differential treatment response, which is the ultimate test of whether biologically defined groups are clinically useful.

The second is multifinality: the same biological process can produce radically different clinical presentations depending on developmental context, environmental exposure, psychological history, and a dozen other variables. A genetic variant associated with heightened stress reactivity might contribute to anxiety in one developmental context, depression in another, and resilience in a third. The relationship between biology and phenomenology isn't one-to-one. It's many-to-many, with intervening variables at every level.

The third problem is what the psychologist Scott Lilienfeld, in the sharpest philosophical critique RDoC has received, called the mereological fallacy โ€” the assumption that understanding parts will explain wholes. RDoC's matrix presupposes that psychological constructs can be meaningfully decomposed into units of analysis arranged hierarchically from genes to self-report, and that research at each level will integrate into a coherent picture of the whole. But fifteen years of research have not demonstrated that these levels integrate in the clean, bottom-up manner the matrix implies. A 2024 analysis found that of 120 projects funded by RDoC since its inception, only one directly engaged with predictive processing โ€” the most influential theoretical framework in cognitive neuroscience. RDoC's funding priorities have been overwhelmingly data-driven rather than theory-driven, producing masses of correlational data without a unifying mechanistic framework for interpreting them.

And in January 2025, a study published in Nature Communications attempted to test whether RDoC's six domains actually correspond to distinct patterns in brain activity. The researchers analyzed eighty-four different brain-scanning tasks โ€” a bifactor analysis, meaning a statistical technique that looks for both a general factor and specific factors in brain-imaging data โ€” and found that RDoC's six-domain structure doesn't accurately map onto brain circuitry as measured by neuroimaging. The cognitive systems domain should probably be split. A task-general factor should be added. The framework's foundational organizational structure โ€” the architecture of the replacement infrastructure โ€” may itself need substantial revision.


The most damning assessment of what the biological promise has delivered came from the person most responsible for making it.

In 2015, leaving NIMH after thirteen years as its director, Thomas Insel reflected on what his tenure had accomplished: "I spent 13 years at NIMH really pushing on the neuroscience and genetics of mental disorders, and when I look back on that I realize that while I think I succeeded at getting lots of really cool papers published by cool scientists at fairly large costs โ€” I think $20 billion โ€” I don't think we moved the needle in reducing suicide, reducing hospitalizations, improving recovery for the tens of millions of people who have mental illness."

Twenty billion dollars. Cool papers. No moved needles.

In his 2022 book Healing, Insel elaborated on this reckoning. The science, he wrote, had been looking for causes and mechanisms while the effects of mental disorders were playing out in rising death and disability, increasing incarceration and homelessness, and deepening despair for patients and families. His metaphor was brutal: they had been studying the chemistry of the paint while the house was on fire. Notably, RDoC is not mentioned by name in the book โ€” an extraordinary omission from the framework's principal architect. Insel's post-NIMH career pivot โ€” to Google Health, then to digital mental health startups, then to California state policy โ€” represented a personal repudiation of the biomarker-first strategy. His subsequent prescription for healing centered on "people, place, and purpose." None of those appear in the RDoC matrix.

Critics have been harsher still. The historian Andrew Scull called the approach a "foolish monism" that caused the social and phenomenological dimensions of mental illness to all but disappear as questions worthy of serious attention. Allen Frances โ€” who had chaired the DSM-IV task force and knew something about the politics of classification โ€” argued that brain research should remain part of a balanced agenda at NIMH but not its sole preoccupation. The $20 billion was not a failed investment in the sense that it produced no knowledge. It produced enormous amounts of knowledge. The failure was in the assumption that this particular kind of knowledge โ€” biological, mechanistic, circuit-level โ€” was the kind that would translate into better classification, better diagnosis, and better outcomes.


The infrastructure metaphor makes the deepest sense of what happened here. RDoC was an attempted infrastructure replacement โ€” a conscious, well-funded, institutionally backed effort to build new classification infrastructure on biological foundations. It diagnosed the existing infrastructure's problems correctly: DSM categories had been reified, they didn't map onto biology, and organizing research around them was producing diminishing returns. The diagnosis was right. The remedy was wrong, or at least premature.

The problem was that the new infrastructure couldn't connect to anything. Clinical practice still runs on DSM codes. Insurance reimbursement still requires categorical diagnoses. Legal standards for commitment, competency, and disability still reference DSM categories. Drug regulation still uses them. Training programs still teach them. The entire institutional ecosystem โ€” the dependency web that makes psychiatric classification function as infrastructure โ€” was built on top of the existing system, and the replacement system had no interfaces to any of it.

In software, this is a familiar failure mode. You build a beautiful new system on better principles โ€” cleaner architecture, more elegant design, superior abstractions โ€” but it can't connect to the legacy systems that everyone actually uses. It doesn't speak the right protocols. It doesn't produce outputs in the formats the downstream systems consume. It's technically superior and practically useless, because infrastructure isn't just architecture โ€” it's connections. An isolated system, no matter how well-designed, isn't infrastructure. It's a prototype.

RDoC remains a prototype. The research culture has shifted โ€” that's real. The Fast-Fail trials have produced a drug pipeline โ€” that's real. The biotypes work has shown that biology-first groupings are possible โ€” that's real. But the system that was supposed to eventually generate a new clinical classification has not done so, and the gap between research framework and clinical infrastructure shows no signs of closing.

And now RDoC faces a threat that has nothing to do with its scientific merits. The proposed restructuring of the National Institutes of Health โ€” which would consolidate its twenty-seven institutes into eight and absorb NIMH into a new National Institute on Behavioral Health โ€” could disrupt the institutional continuity that RDoC depends on. There are currently no active RDoC-specific funding opportunities listed on the NIMH website. The framework was always institutionally dependent on NIMH's willingness to incentivize transdiagnostic research designs through peer review guidelines. If NIMH's identity and institutional continuity are disrupted, that enforcement mechanism could evaporate. RDoC has not become self-sustaining in the broader research culture. It still requires active institutional support, and that support is no longer guaranteed.

The biological promise is not dead. Biology matters โ€” no serious account of mental suffering can ignore the brain. The question was never whether biology is relevant but whether it's sufficient as a foundation for classification. The last fifteen years suggest it isn't โ€” not because the biology is wrong, but because the biology is too complex, too context-dependent, and too far from clinical application to serve as the infrastructure's foundation in the way that Kandel and Insel envisioned. The brain doesn't hand you categories. It hands you gradients, continuums, probabilistic associations, and interactions at every level of analysis. It hands you, in other words, exactly the kind of complexity that categorical classification was designed to simplify โ€” and the simplification is where the human judgment, the institutional needs, and the looping effects come back in through every door.

If biology can't provide the foundation, maybe measurement can. Maybe the question isn't what psychiatric disorders are โ€” biologically, philosophically, ontologically โ€” but how well we can measure the patterns of suffering we observe, whatever their underlying nature. This is the promise of psychometrics, and it's the subject of the next section. But it carries its own paradox, as deep and as unsettling as anything we've encountered so far: what does it mean to measure something precisely if you're not sure what the something is?

Section 7 โ€” Measuring the Unmeasurable

Forget the brain. Forget the genome. Just look at the data โ€” at how symptoms cluster, how they covary across populations, how they distribute along continuums or clump into groups โ€” and let the statistical structure tell you what's real.

This is the promise of psychometrics applied to psychiatric classification. It sounds modest after the grand ambitions of the biological program โ€” less a new foundation than a better surveyor's toolkit. But the psychometric lens reveals something that the biological lens, for all its sophistication, couldn't see clearly: the measurement infrastructure of psychiatry may itself be broken. Not just the categories it measures, but the assumptions behind what "measuring" means when the thing you're measuring hasn't been defined.


The story begins with a bargain that shaped everything.

When Robert Spitzer redesigned the DSM in 1980, his central problem was chaos. Pre-DSM-III psychiatry had a reliability crisis: two clinicians interviewing the same patient might give different diagnoses more often than not. Spitzer's analysis of the existing literature, published with Joseph Fleiss in 1974, documented kappa coefficients โ€” the standard statistical measure of inter-rater agreement, corrected for chance โ€” often below 0.5 for major diagnostic categories. Psychiatry, by its own measures, couldn't agree on what it was looking at.

Spitzer's solution was operational criteria: specific symptom lists, explicit duration requirements, precise thresholds. Five of nine symptoms for at least two weeks meant Major Depressive Disorder. Fewer than five, or fewer than fourteen days, meant something else โ€” or nothing at all. The criteria were arbitrary in the sense that no empirical study had shown that five symptoms was the natural boundary. But they were explicit, and that explicitness did what it was designed to do. The Research Diagnostic Criteria, Spitzer's precursor system, achieved kappas of 0.6 to 0.8 for major categories. DSM-III's field trials reported similar numbers. The reliability crisis was, by the metrics that mattered, solved.

But the bargain had a cost that only became visible later. By optimizing for agreement โ€” for different clinicians arriving at the same label โ€” Spitzer had quietly deprioritized the question of whether the labels themselves referred to anything coherent. Reliability asks: do we agree on what we're seeing? Validity asks: are we seeing something real? These are different questions, and the history of DSM since 1980 is substantially the story of what happens when you build an entire infrastructure around the first question while hoping the second will take care of itself.

The psychiatrists Robert Kendell and Assen Jablensky made the distinction surgical in a 2003 paper that should have been more influential than it was. They argued that very few psychiatric diagnoses have been validated in the strong sense โ€” demonstrated to be discrete entities with natural boundaries separating them from other conditions and from normality. What most diagnoses have demonstrated is utility: they're clinically informative, they predict course and treatment response somewhat, they facilitate communication among clinicians. Validity and utility are not the same thing, and the DSM's success has been almost entirely in the second category. The system is useful. Whether it describes real things remains an open and largely unaddressed question.

The philosopher of science Denny Borsboom put the deeper problem in focus. In his 2005 book Measuring the Mind, he argued that most psychological measurement lacks a coherent theory of what is being measured. You can compute a reliability coefficient and a factor loading (a statistical measure of how strongly an observed symptom relates to an underlying pattern), but if you can't specify the causal relationship between a latent variable โ€” the unobserved thing you believe is causing the symptoms you can see โ€” and the observed scores it supposedly produces, you don't actually know what you've measured. The measurement apparatus is technically sophisticated and theoretically hollow. Psychiatric diagnosis, by this account, is an elaborate system for reliably measuring something whose nature hasn't been established โ€” like a bathroom scale that gives the same reading every time you step on it, but nobody's checked whether it's measuring weight, height, or something else entirely.


The hollowness became empirically visible when DSM-5 went through its own field trials, and the numbers came back.

The DSM-5 field trials, published by Darrel Regier and colleagues in the American Journal of Psychiatry in 2013, tested diagnostic reliability under naturalistic conditions โ€” real clinicians, real patients, clinical settings rather than research labs. The results, measured by the same kappa statistic that Spitzer had used to diagnose the pre-DSM-III crisis, were sobering. Major Depressive Disorder โ€” the most commonly diagnosed condition in outpatient psychiatry, the category around which billions of dollars of treatment infrastructure had been organized โ€” achieved a kappa of 0.28. Generalized Anxiety Disorder achieved 0.20. By the standards the field had established decades earlier, these numbers indicated poor to unacceptable agreement.

What happened next was more revealing than the numbers themselves. Rather than treating the results as evidence of a measurement problem, the DSM-5 committee redefined acceptable reliability. Kappas in the 0.2 to 0.4 range, which DSM-III's architects would have considered evidence of failure, were relabeled as "acceptable" under a new framework. The accompanying editorial, by Robert Freedman and colleagues, presented the disappointing numbers as a consequence of "real-world" testing conditions โ€” as if testing diagnosis in the conditions under which diagnosis actually occurs were somehow an unfair test. Allen Frances, who had chaired the DSM-IV task force, was blunt in his public response: DSM-5's own data showed the system was less reliable than DSM-III, which was the problem the system was supposed to have fixed.

The infrastructure metaphor captures what happened precisely. When the measurement infrastructure produces results that threaten the classification infrastructure, the response isn't to question the classification. It's to recalibrate the measurement. Change the standard, not the system. This is a classic response of entrenched infrastructure to threatening evidence โ€” the same dynamic that played out when the biological program's failure to produce biomarkers was reframed as a problem of premature clinical expectation rather than a challenge to the categories themselves.


But psychometrics has a more radical contribution to offer than merely documenting the DSM's reliability problems. It can also ask a question the DSM's structure forecloses: what if the categories are the wrong unit of analysis entirely?

The most provocative version of this question came from the psychologists Avshalom Caspi and Terrie Moffitt, working with longitudinal data from the Dunedin Study โ€” a birth cohort of roughly a thousand New Zealanders tracked from age three into middle adulthood. In a 2014 paper published in Clinical Psychological Science, they reported that the comorbidity structure of mental disorders โ€” the well-documented finding that people with one diagnosis tend to have others โ€” could be explained by a single general factor of psychopathology, which they called p.

The analogy was deliberate: just as Spearman's g represents a general factor underlying performance across diverse cognitive tests, p represents a general vulnerability that increases risk for virtually every form of mental distress. People high on p are more likely to develop depression, anxiety, substance use problems, psychosis, and personality pathology. The factor robustly emerges across datasets and populations. And if it's real, the implications for classification are seismic. The DSM's hundreds of discrete categories might be like a system of roads built on a misunderstanding of the terrain โ€” an elaborate infrastructure for navigating a landscape that doesn't have the distinct regions the map insists on.

What p actually is, however, remains genuinely unclear. A comprehensive 2022 review by Ashley Watts and colleagues catalogued the interpretive options: p might be a causal entity โ€” some underlying neurobiological vulnerability that manifests differently depending on context. It might be a statistical artifact of shared method variance โ€” a consequence of the fact that all psychopathology measures rely on self-report and clinician observation, which share sources of bias, so they tend to correlate even if the underlying conditions are unrelated. It might index the accumulated burden of environmental adversity, capturing the well-established finding that childhood trauma, poverty, and social disadvantage increase risk for nearly everything. Or it might be an emergent property of a system in which psychological problems cause other psychological problems, such that distress in any domain tends to cascade.

Each interpretation has different implications for what classification should look like. If p is a genuine entity, the hierarchical approach is right: organize the field around a general vulnerability factor, then differentiate within it. If p is a measurement artifact, the entire finding is misleading. If p reflects cascading causation, the answer is something else entirely โ€” a network model rather than a hierarchy.


The most developed attempt to build a classification system from psychometric foundations is the Hierarchical Taxonomy of Psychopathology โ€” HiTOP.

HiTOP organizes psychopathology as a hierarchy of dimensions rather than a catalogue of categories. At the apex sits p. Below it, six broad spectra โ€” Internalizing (depression and anxiety), Somatoform (physical symptoms with psychological origins), Thought Disorder (psychosis and disorganized thinking), Detachment (social withdrawal and flattened emotion), Disinhibited Externalizing (impulsive and addictive behavior), and Antagonistic Externalizing (aggression and antisocial conduct) โ€” each capturing a broad domain of dysfunction. Below those, more specific subfactors: within Internalizing, for instance, Fear separates from Distress. Below those, constructs that begin to resemble familiar diagnostic entities โ€” depression, panic disorder, alcohol use disorder โ€” but treated as dimensional rather than categorical. And at the base, individual symptoms and traits.

The structure wasn't designed by a committee; it was derived from factor analysis โ€” a statistical technique that identifies clusters of symptoms that tend to co-occur, revealing hidden structure in large datasets โ€” assembled by an international consortium of roughly a hundred researchers who synthesized decades of independent findings. The meta-analytic evidence supporting its upper-level structure is strong โ€” Ringwald, Forbes, and Wright's 2023 analysis confirmed that the spectral organization replicates robustly across samples. HiTOP's dimensional model has more empirical support than the DSM's categorical structure; it is arguably the best-validated alternative architecture available. And HiTOP solves, or at least dissolves, some of the DSM's most persistent problems. Comorbidity becomes expected rather than embarrassing โ€” if depression and anxiety load onto the same internalizing spectrum, their co-occurrence isn't two diseases coinciding but one dimension manifesting in related ways. The boundary problem disappears โ€” no arbitrary threshold separating disorder from non-disorder, just a continuum on which everyone falls somewhere. The heterogeneity problem is reduced โ€” dimensional scores capture more of the variance in how people actually present than binary diagnostic labels can.

The system is intellectually elegant. It is also, as of mid-2025, clinically untested.

No published empirical study has demonstrated that clinicians using HiTOP achieve better treatment outcomes than those using the DSM. Field trials are underway but unpublished. The HiTOP consortium has promoted the framework as ready for clinical implementation, which critics โ€” most pointedly Gregory Haeffel and colleagues, in a 2022 paper titled "HiTOP Is Not an Improvement Over the DSM" โ€” regard as premature. Evidence, they argued, should precede recommendations.

The clinical problem is practical. The DSM, for all its deficiencies, has a rough-and-ready connection to treatment. You diagnose Major Depressive Disorder, you know the first-line options are antidepressants and cognitive-behavioral therapy. HiTOP currently lacks this. If a patient scores high on internalizing distress with an elevated fear subfactor and moderate detachment, what treatment does that profile dictate? The existing evidence base for treatment matching was built around DSM categories, because those are the categories that have organized clinical trials for forty years. HiTOP's dimensions may carve more accurately, but they carve into a void โ€” a dimensional landscape with no treatment infrastructure built on it.

And there is a deeper objection, one that connects to everything this essay has been building. HiTOP reorganizes symptoms into dimensions rather than categories, but it doesn't change what is being organized. Both systems are descriptive taxonomies built from observed signs, symptoms, and self-reported experiences. The inputs are the same. As Haeffel and colleagues wrote: it doesn't matter whether the symptom profiles are created by factor analysis or expert consensus, or whether they're operationalized as dimensions or categories โ€” these differences don't alter the fact that both HiTOP and the DSM are symptom-based taxonomies that share the same underlying assumptions and inherent limitations.

The psychologist Eiko Fried has made the point empirically vivid. In a 2017 analysis, he showed that seven commonly used depression scales overlap in only twelve percent of their symptom content. If the field can't agree on what symptoms constitute "depression" at the base level, then the factor-analytic solutions built on top of those measurements are constructed on shifting sand. HiTOP reorganizes DSM-derived constructs more elegantly, but it doesn't escape the original problem of construct definition. It offers a better map of the same terrain, drawn with better instruments โ€” but doesn't question whether the terrain has been correctly identified.

HiTOP is, in effect, the technically superior replacement that nobody has installed โ€” the same story as RDoC, told in psychometric rather than biological terms. The architecture is better. The connections to the existing infrastructure don't exist yet.


Borsboom, the same philosopher who diagnosed the measurement problem, has proposed the most radical alternative. His network theory, published in World Psychiatry in 2017, challenges the assumption shared by both the DSM and HiTOP: that there is an underlying thing โ€” a latent variable, a disease entity, a dimensional construct โ€” causing the symptoms we observe. What if there isn't? What if symptoms cause each other directly?

In network models, insomnia doesn't indicate an underlying "depression" that also causes fatigue, hopelessness, and concentration problems. Instead, insomnia causes fatigue, which causes concentration problems, which causes occupational impairment, which causes hopelessness. The symptoms are the disorder. There is no ghost behind the machine. What we call "depression" is a label for a pattern of causally connected problems that tend to activate together, not the name of a latent entity that produces them.

If Borsboom is right, both the DSM's categories and HiTOP's dimensions are asking the wrong question โ€” looking for underlying things when they should be mapping causal networks. The measurement question shifts fundamentally: instead of trying to measure how much of a latent construct someone "has," you're trying to map which symptoms are active, how they're connected, and which connections are driving the system.

This is a beautiful idea. It is also, at this point, largely theoretical. Network models have been difficult to estimate reliably, difficult to replicate across samples, and difficult to translate into clinical practice. The within-person causal dynamics that the theory privileges are precisely what cross-sectional data โ€” the kind most available โ€” cannot capture. But the theoretical challenge to both categorical and dimensional approaches is genuine, and it exposes a question that psychometrics alone cannot answer: what are we measuring? A thing? A dimension of a thing? A pattern of causally related things? The statistical apparatus can organize data with enormous sophistication. It cannot tell you what the data represent.


There is a prior question, even more fundamental, that the psychometric tradition has largely avoided confronting.

The psychologist Paul Meehl spent decades developing taxometric methods โ€” statistical procedures designed to answer a seemingly simple question: do people with a given psychiatric condition clump into a discrete group (a taxon), or do they vary along a continuum? The distinction matters enormously for classification. If conditions are taxonic, categorical diagnosis makes sense โ€” there's a real group to find. If they're dimensional, categories are impositions on continuous variation.

Nick Haslam's reviews of the taxometric evidence, spanning work from 2003 through 2012, paint a consistent picture. A small number of conditions show taxonic structure โ€” melancholic depression, schizotypy, possibly some presentations of autism. Most conditions appear dimensional. The statistical evidence, accumulated across dozens of studies and multiple methods, suggests that the DSM's categorical structure systematically misrepresents how psychopathology is distributed. Most of the boundaries the system draws don't correspond to natural discontinuities in the data. They are imposed by the system, not found in the phenomena.

And yet this still assumes that whatever is being distributed โ€” categorically or dimensionally โ€” has been correctly defined. Fried and Randolph Nesse's 2015 analysis of the STAR*D study showed that among 1,566 patients diagnosed with Major Depressive Disorder, there were 1,030 unique symptom profiles. Nearly eighty-five percent of profiles were shared by five or fewer patients. Two people with the same diagnosis might have almost no symptoms in common. The category is so heterogeneous that asking whether it's taxonic or dimensional may itself be the wrong question. It's like asking whether a bag of different objects is one kind of thing or a spectrum of things when the bag was filled more or less at random.

This is where psychometrics meets a problem the philosopher Ludwig Wittgenstein saw coming. Wittgenstein argued that many concepts hold together not through a single defining feature but through overlapping family resemblances โ€” shared threads that connect members without any one thread running through all of them. "Depression" may be a family resemblance concept: the word refers to a cluster of loosely related human experiences that overlap in complex ways without sharing a common essence. If so, the entire measurement enterprise is operating under a category error. You can refine the measurement instrument indefinitely โ€” better scales, more precise factor loadings, more sophisticated statistical models โ€” but if the construct itself is a family resemblance cluster rather than a natural kind or a coherent dimension, the precision is illusory. You are measuring more and more precisely something that is less and less clearly a single thing.


The psychometric lens, then, reveals a paradox at the heart of psychiatric classification. The DSM optimized for reliability and achieved it โ€” clinicians can agree on labels. But the labels may not refer to coherent entities, which means the reliability is formally impressive and substantively hollow. HiTOP offers a dimensionally superior alternative โ€” better statistical foundations, more empirically defensible structure, an elegant solution to comorbidity โ€” but it shares the DSM's symptom basis and lacks the institutional infrastructure that makes the DSM functional. Network theory challenges both approaches by questioning the existence of the latent variables they assume, but remains difficult to implement. And the taxometric evidence suggests that most of what the DSM treats as discrete categories is actually continuous, while the heterogeneity evidence suggests that what the DSM treats as single constructs may be incoherent aggregates.

Each approach captures something real. None captures enough. And the fundamental measurement problem โ€” what does it mean to measure something precisely when the something hasn't been defined? โ€” remains unresolved. The bathroom scale keeps producing readings. The readings are consistent. But nobody has established what the scale is measuring, and the possibility that it isn't measuring any single thing at all hangs over the entire enterprise like a question that nobody wants to ask out loud because the infrastructure depends on not answering it.

The next section of this essay won't try to answer it, either. Instead, it will hand you the problem. In Game 3, you'll sit with the tradeoff that this section has described โ€” the tension between reliability and validity โ€” and experience directly what happens when you try to optimize a diagnostic system for both at once. You'll tune the system for agreement and watch accuracy slip. You'll tune it for accuracy and watch agreement collapse. And you'll discover, in the space where the two demands pull apart, something about why this infrastructure looks the way it does and why changing it is so much harder than building a better measurement instrument.

๐ŸŽฎ

Game 3: Reliable or Valid

Tune a diagnostic system and experience the fundamental tradeoff: optimize for agreement between clinicians and watch accuracy slip. Optimize for accuracy and watch agreement collapse.

Loading interactive experienceโ€ฆ

Section 8 โ€” Classification as Social Instrument

Part III: The Systems Outside the System

You've just tried to build a diagnostic system that was both reliable and valid, and you've discovered that you can't fully have both. The tradeoff wasn't a flaw in your design. It was a structural feature of the problem โ€” the same tension that has shaped every edition of the DSM since 1980.

Now the essay does something different.

Everything so far โ€” the philosophy, the biology, the psychometrics โ€” has examined psychiatric classification from the inside. We've asked what diagnoses are, what they loop into, what biology can and can't underwrite, how well the system measures what it claims to measure. These are the questions the field asks about itself, and they're real questions with complicated answers.

But here's what those questions miss: the DSM doesn't exist inside a philosophy seminar or a neuroimaging lab. It exists inside a society. It exists inside insurance systems that need billing codes, legal proceedings that need expert testimony, disability programs that need eligibility criteria, pharmaceutical companies that need drug markets, and schools that need accommodation categories. The diagnostic manual is embedded in all of these systems simultaneously, and they all pull on it โ€” shaping what gets classified, how broadly, and for whose benefit. Understanding psychiatric classification without understanding these forces is like understanding a road without understanding the economy that paid for it, the political process that routed it, and the interests that profit from the traffic it carries.

Part III pulls the camera out. The question is no longer What is this system? but What does this system do โ€” and for whom?


In 2006, a team led by Lisa Cosgrove at the University of Massachusetts Boston and Sheldon Krimsky at Tufts published a study in Psychotherapy and Psychosomatics that quantified something the field had long preferred to leave unquantified. They examined the financial relationships between every member of the DSM-IV panels and the pharmaceutical industry. What they found was this: 56 percent of all panel members โ€” 95 of 170 โ€” had at least one financial association with a drug company. For the two panels that mattered most commercially โ€” Mood Disorders and Schizophrenia and Other Psychotic Disorders โ€” the figure was 100 percent. Every single person making decisions about how to define depression and psychosis had financial ties to the companies that sold treatments for depression and psychosis.

I want to start with this finding not because it's the most important point in this section โ€” it isn't โ€” but because it illustrates the gap between how we usually talk about psychiatric classification and what's actually happening. The usual conversation is about science: are the categories valid? Do they carve nature at the joints? The Cosgrove and Krimsky finding is not about science. It's about power โ€” about who sits at the table when the categories are defined, whose interests are represented, and what structural incentives shape the decisions. And once you see the classification through this lens, you can't unsee it.


Begin with the most basic observation: a psychiatric diagnosis is a key.

Not metaphorically โ€” literally. In most of the institutional systems that organize mental health care, you cannot access services without a diagnosis. Insurance companies will not reimburse treatment without a billing code, and billing codes map onto DSM categories. Disability programs require a qualifying diagnosis. Schools provide accommodations through categories like ADHD and autism spectrum disorder. Courts evaluate competency, culpability, and commitment through diagnostic testimony. The Veterans Administration, which provides care to millions, built its entire benefits structure around diagnostic eligibility.

The sociologist Annemarie Jutel captured this precisely: diagnosis is a social act that sorts people into categories with material consequences. It determines who gets treatment and who doesn't, who qualifies for disability and who doesn't, who can claim sick leave and who is malingering, who is legally responsible and who is diminished. The classification infrastructure doesn't just describe the distribution of mental illness in a population. It distributes resources to that population. It is, in the language of political theory, an allocative system โ€” a mechanism for deciding who gets what.

This means that every decision about where to draw a diagnostic boundary is also a decision about who has access to care. Broaden the criteria for ADHD, and more children receive school accommodations and more adults gain access to stimulant medication. Narrow the criteria for PTSD, and fewer veterans qualify for disability benefits. Add a new disorder โ€” premenstrual dysphoric disorder, binge eating disorder, prolonged grief disorder โ€” and an entirely new population becomes eligible for treatment it couldn't previously access. Remove a disorder โ€” as happened when homosexuality was voted out of the DSM in 1973 โ€” and the institutional machinery that had pathologized, treated, and sometimes involuntarily committed people on that basis loses its justification overnight.

The political scientist Wilbur Scott documented one of the cleanest examples of this mechanism in his study of how PTSD entered the DSM-III. The diagnosis was not discovered through clinical research and then added to the manual. It was demanded โ€” by Vietnam veteran advocacy groups who needed a diagnostic category to access VA benefits. Without a recognized disorder in the DSM, veterans experiencing the psychological aftermath of combat had no institutional pathway to care. The category was assembled through a political process involving veteran organizations, sympathetic psychiatrists, and DSM committee negotiations. The medical anthropologist Allan Young went further in his ethnography of a VA treatment unit, showing that PTSD was constructed from the intersection of veteran advocacy, VA institutional needs, and committee politics โ€” not that the suffering wasn't real, but that the category was assembled from political materials as much as scientific ones.

None of this means PTSD is fake. The suffering it names is genuine, and the diagnosis has provided millions of people with access to treatment and recognition they desperately needed. But it does mean that the relationship between scientific evidence and diagnostic inclusion is not as straightforward as the field's self-presentation suggests. Categories enter the DSM through a process that is simultaneously scientific and political, and the political dimension is not a corruption of the scientific one โ€” it's a permanent feature of how classification systems work when they function as infrastructure.


The sociologist Peter Conrad, in what remains the sharpest analytical framework for understanding these dynamics, argued that the key concept isn't corruption or conspiracy. It's medicalization โ€” the process by which human conditions come to be defined and treated as medical problems. Conrad's insight, developed across three decades of work, was that medicalization isn't driven primarily by doctors. It's driven by the interaction of several forces: biotechnology companies that need conditions to sell treatments for, consumers who want medical explanations for their difficulties, managed care organizations that need diagnostic codes to process claims, and advocacy groups that need medical legitimacy to secure resources.

The engines of medicalization, in Conrad's framework, are structural rather than conspiratorial. No one needs to be acting in bad faith. Pharmaceutical companies develop drugs and then seek the largest possible market for them โ€” which means they have a natural interest in broader diagnostic categories. Patient advocacy groups seek recognition and resources for their members โ€” which means they have a natural interest in their condition being recognized as a legitimate diagnosis. Insurance companies need administrative efficiency โ€” which means they have a natural interest in clear, categorical diagnoses that map cleanly onto billing codes. Clinicians want to help their patients โ€” which means they have a natural interest in having a diagnosis that unlocks treatment. Each actor is pursuing reasonable goals. The aggregate effect is a system with structural pressure toward expansion โ€” toward more conditions recognized as disorders, broader definitions of existing disorders, and pharmacological solutions for an ever-wider range of human experience.

Allan Horwitz documented the specific mechanism through which this expansion operates in the DSM. His argument, developed in Creating Mental Illness, targets a design decision that DSM-III made in 1980 and that every subsequent edition has preserved: the elimination of context from diagnostic criteria. Before DSM-III, a clinician diagnosing depression was expected to consider why the person was depressed. Sadness following the death of a spouse was understood differently from sadness that arrived without apparent cause. The diagnosis was contextual โ€” it required clinical judgment about whether the symptoms were proportionate to the person's circumstances.

DSM-III stripped this context away. In the name of reliability โ€” the very tradeoff Game 3 just made you experience โ€” the manual replaced contextual clinical judgment with standardized symptom checklists. If you had five of nine symptoms for two weeks, you met criteria for Major Depressive Disorder, regardless of what was happening in your life. The one exception was bereavement: DSM-III included a clause excluding grief-related depressive symptoms from the diagnosis, on the grounds that intense sadness after losing someone you love is not a mental illness.

This single design decision โ€” decontextualization โ€” had enormous downstream consequences, and Horwitz traces them meticulously. Without context, the boundary between disorder and normal human suffering becomes a matter of symptom count and duration, not clinical meaning. A person devastated by job loss, divorce, or serious illness can meet full diagnostic criteria for Major Depressive Disorder if their devastation lasts long enough and manifests in the right symptom pattern. The category doesn't ask why. It asks how many and how long.

The result, Horwitz argues, was a massive expansion of the population diagnosable with depression โ€” not because more people were becoming mentally ill, but because the diagnostic infrastructure had been redesigned in a way that captured normal responses to adversity alongside genuine pathology. The prevalence of depression appeared to explode, not because the underlying reality changed but because the measurement instrument changed. And this expanded prevalence had institutional consequences: more patients, more prescriptions, more insurance claims, a larger addressable market for antidepressant medications.


This brings us back to the pharmaceutical industry, and to the specific mechanism by which commercial interests have shaped the classification infrastructure. The point here is not to make a conspiracy argument โ€” the evidence doesn't support one, and a conspiracy frame actually obscures what's more interesting and more durable about the problem. The point is to make a structural argument: that the incentive architecture of modern psychiatry creates systematic pressure on classification in directions that serve commercial interests, through mechanisms that are legal, transparent, and genuinely difficult to disentangle from legitimate scientific activity.

The historian David Healy documented the most striking case of this mechanism in his account of how the modern concept of depression was constructed. His core claim, developed across The Antidepressant Era and Let Them Eat Prozac, inverts the expected sequence. The standard story says: first we identified a disease (depression), then we developed drugs to treat it (antidepressants). Healy's documented history runs the other direction. The drugs came first. Imipramine was discovered in 1957; the monoamine oxidase inhibitors followed shortly after. These drugs needed a market, and the market needed patients. Before the 1950s, the World Health Organization estimated the prevalence of depression at roughly 50 to 100 per million โ€” it was understood as a severe, relatively rare condition. Within two decades, it had become one of the most commonly diagnosed conditions in medicine. What changed was not the amount of suffering in the world. What changed was the classification infrastructure through which that suffering was organized.

Section 6 already discussed how the serotonin hypothesis โ€” the claim that depression results from a deficit of serotonin in the brain โ€” was reverse-engineered from the mechanism of SSRIs rather than independently established and then treated. That story belongs here as well, because it illustrates how the pharmaceutical pipeline functions as infrastructure that reshapes the classification system above it. The drug mechanism became the disease explanation. The disease explanation organized the clinical trials. The clinical trials validated the drug. The drug sales funded further research organized around the same disease concept. Each element reinforced every other, and the system became self-sustaining โ€” not because anyone designed it to work this way, but because each component was doing what it was designed to do within a structure that no one was looking at as a whole.

The social anxiety disorder case makes the structural pattern even more visible. "Social phobia" appeared in DSM-III in 1980 as a narrow category โ€” fear of specific performance situations, considered rare. It was broadened to "Social Anxiety Disorder" in DSM-IV in 1994, redefined as persistent fear of social situations in which one might be embarrassed or judged. The historian Christopher Lane documented what happened next in Shyness: How Normal Behavior Became a Sickness. SmithKline Beecham (later GlaxoSmithKline), which held the patent on paroxetine (Paxil), hired a public relations firm to run a disease awareness campaign. The campaign's slogan โ€” "Imagine being allergic to people" โ€” reframed shyness as a medical condition with a pharmaceutical solution. The company funded educational materials, sponsored screening days, and supported advocacy groups. Within a few years, social anxiety disorder had gone from an obscure diagnostic backwater to one of the most commonly diagnosed anxiety disorders in America. Again: the suffering was real. People who struggled with social situations had struggled before the diagnosis existed and would struggle after. But the category โ€” its scope, its public profile, its clinical ubiquity โ€” was shaped by commercial interests operating through perfectly legal channels.

And then the bereavement exclusion fell. DSM-5, published in 2013, eliminated the last remaining contextual exemption in the depression criteria โ€” the clause that had protected grief from being diagnosed as Major Depressive Disorder. The decision was made by the Mood Disorders work group. That work group, as Cosgrove and Krimsky had documented, had the highest concentration of pharmaceutical industry ties of any DSM-5 panel. The supporters of removal argued, not unreasonably, that there was no scientific evidence that bereavement-related depression was fundamentally different from other forms of depression, and that the exclusion might prevent genuinely depressed bereaved individuals from receiving treatment. The philosopher Jerome Wakefield countered, in meticulous detail, that the studies cited to justify removal largely failed to distinguish between genuinely depressed bereaved individuals (who were already capturable under DSM-IV's override criteria) and people experiencing intense but normal grief.

The point is not to adjudicate the scientific dispute. The point is that a decision about where to draw the line between grief and depression โ€” a decision with enormous consequences for drug markets, since broader diagnostic boundaries mean larger addressable populations โ€” was made by a panel whose members had extensive financial relationships with the companies that stood to benefit from a broader boundary. And this structural conflict was not incidental. When Cosgrove and colleagues examined DSM-5-TR, published in 2022, they found that 60 percent of the physicians who worked on the revision had received pharmaceutical industry payments totaling $14.2 million over the preceding three years. One individual had received $2.7 million. The APA did not publicly disclose these ties for DSM-5-TR, despite having done so for DSM-5.

The philosopher Lawrence Lessig developed a framework โ€” institutional corruption โ€” that fits this pattern precisely, and the researchers Robert Whitaker and Lisa Cosgrove applied it to psychiatry in their book Psychiatry Under the Influence. Institutional corruption, in Lessig's formulation, is not bribery. It's the systematic distortion of an institution's primary mission by the influence of secondary interests, through mechanisms that are legal and often invisible to the participants themselves. The psychiatrist who serves on a DSM panel, runs industry-funded clinical trials, and receives consulting income from drug companies is not, in most cases, consciously distorting classification to serve commercial interests. They are a recognized expert doing what recognized experts do within an incentive structure that rewards certain kinds of knowledge production and certain kinds of career behavior. The corruption is in the structure, not the individual โ€” which is precisely what makes it so difficult to address. You can't fix it by finding bad actors, because there may be no bad actors. The system itself is misaligned.


Michel Foucault would have recognized this pattern, though he would have framed it differently. Foucault's analysis of psychiatric power โ€” developed most precisely not in Madness and Civilization but in the later Psychiatric Power lectures โ€” was fundamentally an analysis of how institutional structures constitute the knowledge they claim merely to apply. The asylum didn't discover madness and then treat it. The asylum โ€” its spatial arrangements, its disciplinary routines, its hierarchical relationships โ€” produced the category of madness as an object of medical knowledge. The institution came first. The classification followed. And the classification, in turn, justified the institution.

This is not Foucault as decoration โ€” the vague invocation of "power/knowledge" that appears in so much critical theory without doing any analytical work. This is Foucault as a precise analyst of how institutional infrastructure shapes what counts as knowledge. The medical gaze, as Foucault described it in The Birth of the Clinic, is not a neutral act of observation. It's a trained capacity to see certain things and not others, organized by the institutional context in which the seeing happens. A clinician trained in DSM categories sees DSM categories. They see Major Depressive Disorder and Generalized Anxiety Disorder and Borderline Personality Disorder โ€” not because those are the only ways to organize what the patient is presenting, but because those are the categories the infrastructure has made visible. The infrastructure shapes the perception. The perception reinforces the infrastructure.

Nikolas Rose extended this analysis into the present, arguing that modern psychiatry functions as what he called a technology of governance โ€” a system for making populations legible, sortable, and administrable. Under this framework, psychiatric classification is less about understanding suffering and more about managing risk. It identifies, classifies, and channels people into institutional pathways โ€” treatment, monitoring, containment โ€” and the classification system is the mechanism through which this sorting becomes possible. You don't need to accept the full Foucauldian program to recognize that this analysis captures something real about how the DSM functions in practice. It is not only a scientific tool. It is a governance tool โ€” an instrument through which the state, the insurance industry, the legal system, and the medical profession jointly manage the boundary between normality and pathology.

The infrastructure metaphor predicts something specific about race: if classification systems embed the assumptions of the communities that build them, and if those communities are not demographically representative, then the infrastructure will distribute its consequences unevenly along racial lines. This is not a speculative concern. It is among the best-documented patterns in the system's history.

In the nineteenth century, the physician Samuel Cartwright proposed "drapetomania" โ€” a mental illness whose sole symptom was an enslaved person's desire to escape captivity. The diagnosis is easy to dismiss as a historical absurdity, the kind of thing that couldn't happen now. But the mechanism โ€” embedding a political norm inside a medical category โ€” is not historical. It is structural. The historian Jonathan Metzl traced the same mechanism into the twentieth century, documenting how schizophrenia was reconceptualized in the 1960s as an illness of hostility and aggression and disproportionately diagnosed in Black men during precisely the period when the civil rights and Black Power movements were challenging white institutional authority. The diagnostic category absorbed the politics of the era, and the infrastructure carried that absorption forward.

The contemporary data extends the pattern. Epidemiological studies consistently find that Black Americans are diagnosed with schizophrenia at significantly higher rates than white Americans presenting with similar symptoms, while simultaneously being underdiagnosed for mood disorders. Black children are less likely to receive ADHD diagnoses than white children with comparable presentations. And the panels that write the criteria remain predominantly white โ€” which means the norms embedded in those criteria, including what counts as pathological thought, appropriate emotional expression, and normal behavior, reflect the cultural assumptions of the committees rather than the populations the system serves. Infrastructure carries the fingerprints of its builders. The DSM's differential application across racial groups is among its most consequential social functions, and among its least acknowledged.


The infrastructure metaphor, which has been the essay's spine since the introduction, reveals its sharpest edge here. Bowker and Star's concept of the boundary object โ€” a tool shared across communities with different interests, each using it differently โ€” describes the DSM with unsettling precision. Clinicians use the DSM for treatment decisions. Researchers use it for subject selection. Insurers use it for coverage determination. Lawyers use it for competency evaluation. Pharmaceutical companies use it for market definition. Patient advocacy groups use it for legitimacy and resource access. The manual works precisely because each community can use it differently โ€” which is also why it's so extraordinarily difficult to reform. Any change must be negotiated across all these communities simultaneously. A revision that improves scientific validity might wreck billing efficiency. A revision that serves patients might compromise research comparability. The boundary object serves too many masters to serve any of them well. (The infrastructure metaphor strains here in the way Section 5 identified โ€” these aren't inert pipes, and the people flowing through them are reshaped by the flow. But the institutional dynamics are real, and infrastructure thinking captures them with a precision that no other available framework matches.)

And here is where the infrastructure analysis goes further than the science critique or the corruption critique alone. The question is not just whether the DSM's categories are scientifically valid โ€” we've already seen that many of them are shakier than they appear. And the question is not just whether the process is corrupted by commercial interests โ€” we've already seen that the structural conflicts are real. The deeper question is about control. Who controls the infrastructure controls what flows through it. The APA controls the DSM โ€” it owns the copyright, convenes the panels, makes the final decisions โ€” and the APA's revenue stream depends in part on DSM sales. The pharmaceutical industry doesn't control the DSM, but it is the most powerful commercial user of the infrastructure, and powerful users shape infrastructure to serve their needs. That's not a conspiracy. It's how infrastructure works.

The sociologist Irving Zola, writing in 1972, argued that medicine had replaced religion and law as the primary institution defining deviance in modern societies. If he was right โ€” and the half-century since has largely confirmed him โ€” then the classification system through which medicine sorts the normal from the pathological is not a neutral scientific instrument. It's one of the most consequential allocative mechanisms in modern life, determining access to care, resources, identity, and social standing for hundreds of millions of people. The people making the decisions about that system should probably not be funded by the people who profit most from its categories. The fact that they are โ€” and that the field has been unable to structurally address this despite decades of documentation โ€” tells you something important about the relationship between knowledge production and institutional power.

It also tells you something about what any replacement system would need to contend with. The scientific problems of the DSM โ€” the shaky validity, the arbitrary boundaries, the missing biology โ€” are, in a sense, the easy problems. They're hard to solve, but at least they're problems the field knows how to think about. The political economy of classification โ€” the fact that the system serves as infrastructure for commercial, legal, and administrative interests that have their own stakes in how categories are defined โ€” is the harder problem, because it can't be solved by better science. Better science can give you better categories. It can't give you a governance structure that insulates those categories from capture by the interests they're supposed to regulate.

The infrastructure metaphor has been earning its keep throughout this essay, but here is where it carries the most weight. Infrastructure is never just technical. It's always political. And the politics of psychiatric classification are written into its architecture โ€” not as a bug, but as a feature of what happens when a knowledge system becomes load-bearing for an entire society.

Section 9 โ€” Beyond the Western Frame

Everything this essay has examined so far โ€” the philosophy, the biology, the psychometrics, the sociology โ€” has been an argument conducted within a single tradition. We've been asking whether Western psychiatry's classification system is valid, reliable, scientifically grounded, institutionally captured, socially constructed. These are important questions. But they share a premise so deep it has been invisible until now: that the DSM and its conceptual architecture represent the way of classifying mental suffering, the framework against which all others are measured, the default position from which departures must be explained.

This section removes that premise.

What happens when you discover that other civilizations built entirely different classification systems for human suffering โ€” systems with their own internal logic, their own empirical traditions, their own institutional infrastructure โ€” and that some of these systems are older, more holistic, and in certain respects more sophisticated than the one we've been examining? It doesn't automatically invalidate the DSM. But it does something equally important: it reveals the DSM as one particular solution to the problem of organizing human distress, shaped by one particular culture's assumptions about what suffering is, where it lives, and what should be done about it. The invisible architecture becomes visible not by looking harder at its internal structure โ€” we've been doing that for eight sections โ€” but by seeing that other architects drew entirely different blueprints.


9a โ€” Other Architectures

In the ninth century, a scholar in what is now Afghanistan wrote a book called Sustenance of the Body and Soul. Abu Zayd al-Balkhi was a polymath โ€” geographer, mathematician, theologian โ€” and his book was a medical text, but of a particular kind. It was devoted entirely to psychological disorders and their treatment, written deliberately in plain Arabic so that ordinary people could use it. This alone is remarkable: a standalone manual of psychological medicine, more than a thousand years before the DSM existed. But what makes al-Balkhi essential for this essay is not his priority in time. It's his classification.

Al-Balkhi organized mental suffering into four primary categories: depression, anxiety, anger, and obsessional disorders. Within depression alone, he drew distinctions that Western psychiatry did not formalize until the twentieth century. He differentiated normal sadness โ€” the proportionate response to loss โ€” from what he called reactive depression, a more severe response to serious stressors, and from endogenous depression, which arises without identifiable external cause and requires medical treatment rather than counseling. For obsessional disorders, the psychiatrist Rania Awaad has demonstrated that al-Balkhi should be credited with differentiating obsessive-compulsive conditions from other mental illnesses nearly a millennium before Western nosology made the same move. His therapeutic recommendations included what we would now recognize as cognitive restructuring โ€” rational argument against irrational beliefs โ€” and behavioral activation. He proposed these in the ninth century. Aaron Beck proposed them in the 1960s.

Al-Balkhi was not an isolated figure. He worked within a tradition. Ibn Sina โ€” Avicenna, whose Canon of Medicine served as the primary medical textbook in both Islamic and European universities for centuries โ€” described multiple forms of melancholia, distinguished between their etiologies, documented mixed affective states and the phenomenon of mood switching between depression and mania, and famously diagnosed lovesickness by measuring changes in pulse rate when the beloved's name was mentioned. Al-Razi emphasized the therapeutic relationship and wrote what amounted to a self-help manual for psychological wellbeing. These scholars practiced in institutions. The Islamic world developed sophisticated psychiatric hospitals โ€” bimaristans โ€” beginning in the eighth century, with the hospitals at Baghdad, Cairo, and Fez all including dedicated psychiatric wards with formalized treatment protocols. This was several centuries before comparable European institutions existed. The standard historical narrative that places the origin of institutional psychiatry in eighteenth-century Europe is, simply, wrong.

I want to dwell on al-Balkhi not to play a game of historical one-upmanship โ€” who classified it first โ€” but because his work reveals something about our assumptions. When the essay traced the history of psychiatric classification in Section 2, that history began with Kraepelin in the late nineteenth century and treated everything before it as pre-scientific. This is a story Western psychiatry tells about itself, and it is a story that renders invisible a thousand years of rigorous classificatory work conducted in Arabic, Persian, and Sanskrit. The erasure is not incidental. As the scholar Behnam Shayegani has recently argued, the systematic exclusion of Islamic contributions from the history of psychological science reflects not an oversight but a structural feature of how Western disciplines construct their own genealogies.


Move east. The Charaka Samhita, compiled between roughly 100 BCE and 200 CE, is one of the foundational texts of Ayurvedic medicine, and its chapters on unmada โ€” a term encompassing psychosis and severe mental disturbance โ€” constitute the most detailed classical framework for psychiatric classification in the Indian tradition. What's striking about the Charaka's approach is not that it classifies mental suffering โ€” every medical tradition does โ€” but how it classifies.

The DSM, from its third edition onward, deliberately abandoned classification by cause. It classifies by symptom cluster. If you present with a certain constellation of symptoms for a certain duration, you meet criteria for a disorder, regardless of why you're experiencing those symptoms. The Charaka does the opposite. It classifies by etiology โ€” by what caused the disturbance. Five types of unmada are distinguished, each produced by the vitiation of a different dosha (biological humor): Vataja from disturbed vata, Pittaja from pitta, Kaphaja from kapha, Sannipataja from all three combined, and Agantuja from exogenous causes. Different causes produce different symptom patterns and require different treatments. This isn't a less sophisticated version of what the DSM does. It's a different architectural decision about what classification should optimize for.

And the Charaka's diagnostic process is more structured than you might expect. The text specifies eight psychological factors to be assessed: manas (mind), buddhi (intellect), sanjna jnana (orientation and responsiveness), smriti (memory), bhakti (devotion or desire), sheela (habitual conduct), chesta (psychomotor activity), and achara (behavioral patterns). This is remarkably close to a formalized mental status examination โ€” the structured clinical assessment that Western psychiatry treats as its own invention.

But the deeper difference is ontological. The mind-body split that structures the entire DSM โ€” the premise that "mental" disorders are a meaningful category distinct from physical ones โ€” does not exist in Ayurvedic medicine. Mental distress is understood as an imbalance of the three doshas manifesting simultaneously across mind and body, modulated by the three gunas (qualities of mind): sattva (clarity), rajas (agitation), and tamas (inertia). There is no separate psychiatric manual because there is no separate psychiatric domain. The infrastructure was never built to separate mental from physical, because the culture that built it did not make that separation.

The anthropologist Murphy Halliburton, conducting fieldwork in Kerala, found that patients there move fluidly between Ayurvedic, biomedical, and religious healing systems โ€” not because they're confused about which one is "real," but because they are sophisticated navigators of multiple overlapping classification infrastructures. They use the system that fits the problem, the context, and the consequences. A person might seek temple healing for spiritual distress, Ayurvedic treatment for constitutional imbalance, and a biomedical psychiatrist when they need a prescription that their employer's insurance will cover. This pragmatic pluralism looks incoherent only if you assume that one classification system must be the right one.


Turn to Africa. African traditional healing systems organize mental distress around axes that are, again, fundamentally different from the DSM's. Where Western psychiatry classifies by symptom cluster, most African systems classify by cause โ€” and the relevant causes include ancestral displeasure, witchcraft, spiritual pollution, breach of taboo, and social rupture. This is not a failure to discover the "real" causes. It's a different ontology โ€” a different account of what kinds of things exist in the world and what kinds of forces act on human minds.

Treatment follows classification. If the cause is ancestral, the treatment is ritual. If the cause is social rupture, the treatment is communal reconciliation. The unit of analysis is different, too. The DSM diagnoses individuals. African frameworks, like indigenous frameworks across multiple continents, frequently locate distress in relationships and communities. The person presenting symptoms may not be the person who needs treatment โ€” the family does, or the community does, or the relationship between the living and the dead does. When Katherine Sorsdahl and colleagues studied traditional healers in South Africa, they found classification categories that simply don't map onto DSM categories, organized around spiritual causation and social disruption rather than the symptomatic phenomenology the DSM privileges.

Among the Baganda of Uganda, the researchers Elialilia Okello and Solvig Ekblad found local concepts for what Western psychiatry calls depression โ€” yo'kwekyawa (self-hatred) and okwekubagiza (taking pity on yourself) โ€” that overlap with but do not replicate DSM criteria. The overlap is partial, and the non-overlapping parts are not noise. They're the residue of a different way of carving the experiential territory, one that emphasizes the social and relational dimensions of suffering that the DSM's individualist architecture renders largely invisible.


And then there are the indigenous frameworks โ€” from Australia, the Americas, the Pacific โ€” that challenge the DSM's ontology most radically. These frameworks share certain architectural features across continents despite having no historical contact with one another. Distress is understood as occurring within relationships and communities, not within individual brains. Wellbeing is inseparable from connection to country, spiritual practices, and ancestral relationships. Suffering is organized around collective historical experience โ€” colonization, forced removal, genocide โ€” rather than individual symptom clusters. And no separation is made between mental, physical, spiritual, social, and environmental health.

The Aboriginal Australian concept of Social and Emotional Wellbeing โ€” SEWB โ€” is the most formally articulated of these alternatives. Developed by Aboriginal scholars and practitioners, SEWB encompasses connections to country, culture, community, family, spirit, and physical and mental health. It is positioned explicitly as an alternative to Western mental health frameworks, not a cultural supplement to them. As the scholar Pat Dudgeon and her colleagues have argued, the term itself was chosen deliberately to signal an Aboriginal conception of health that resists reduction to Western psychiatric categories. Health, in this framework, is not the absence of disease. It is the social, emotional, and cultural wellbeing of the whole community.

The Mฤori psychiatrist Mason Durie developed a parallel framework โ€” Te Whare Tapa Whฤ, the Four Sides of the House โ€” with four equal dimensions: taha wairua (spiritual), taha hinengaro (mental/emotional), taha tinana (physical), and taha whฤnau (family/social). The architectural metaphor is deliberate. A house with only one wall standing is not a house. A health framework that addresses only the mental dimension while ignoring the spiritual, physical, and familial is not, in this view, a health framework at all.

I've been cataloguing these systems not as curiosities but as evidence. Taken together, they establish several things that the previous eight sections could not establish from within the Western tradition alone.

First, the DSM's most basic architectural decisions โ€” classifying by symptom rather than cause, separating mental from physical, diagnosing individuals rather than relationships, ignoring spiritual and communal dimensions of health โ€” are not neutral scientific choices. They are culturally specific commitments, inherited from a particular intellectual tradition, that could have gone otherwise and that did go otherwise in every other major civilization.

Second, these alternative systems are not primitive precursors to "real" psychiatry. Al-Balkhi's differential diagnosis of depression subtypes, the Charaka's systematic eight-factor mental status assessment, the CCMD's deliberate taxonomic decisions about which Western categories to adopt and which to reject โ€” these demonstrate classificatory sophistication comparable to, and in some cases exceeding, pre-DSM-III Western approaches. The persistent tendency to treat non-Western systems as "folk" or "traditional" medicine while treating Western psychiatry as "science" is itself a product of the power dynamics the next subsection will examine.

Third โ€” and this is the point the infrastructure metaphor was built to carry โ€” classification by cause produces a fundamentally different clinical architecture than classification by symptom. When you organize suffering by what produced it, treatment flows naturally from diagnosis: different causes require different interventions. When you organize suffering by what it looks like, as the DSM does, you gain reliability (observers can agree on what they see) but lose the etiological thread that might tell you what to do. This is not a new observation โ€” it's what the DSM's own critics have said for decades. But seeing it from outside the Western tradition makes the tradeoff visible in a way that internal critique cannot, because you can see that the choice was a choice, not a necessity.


9b โ€” The Colonial Export

In 1961, Frantz Fanon โ€” a psychiatrist trained in France, practicing in Algeria during the war of independence โ€” devoted the final chapter of The Wretched of the Earth to what he called "Colonial War and Mental Disorders." Fanon had spent years treating both Algerian patients and French soldiers, and what he documented was not merely the psychological damage of colonial violence. It was the deeper problem: that the psychiatric categories available to him โ€” the categories of French medicine โ€” were instruments of the same colonial order that had produced his patients' suffering. The classification system and the system of domination were not separate. They were infrastructure built on the same foundation.

Fanon's insight has only grown sharper with time. The historian Richard Keller, in Colonial Madness, traced how French psychiatry was exported to North Africa as an explicit tool of colonial governance โ€” not incidentally, but by design. Colonial psychiatrists developed theories of racial difference in mental illness that justified colonial hierarchies. The "primitive mentality" of colonized peoples was diagnosed as itself a kind of pathology, and the institutions of psychiatry โ€” the colonial asylum, the classificatory apparatus, the professional authority of the European doctor โ€” functioned as instruments of control.

This history matters not as a moral indictment of dead men but because the infrastructure they built is still in operation. When Western psychiatric categories were exported globally โ€” first through colonial medicine and missionary activity, then through post-colonial development programs, and now through the global mental health movement โ€” they carried with them the ontological commitments of the culture that produced them. The mind-body dualism. The individual as the unit of analysis. The symptom cluster as the basis of classification. The professional expert as the legitimate authority on suffering. These commitments traveled as default settings โ€” invisible infrastructure built into the exported system, shaping what could be seen, said, and treated in every context where the system was installed.

The journalist Ethan Watters captured this dynamic vividly in Crazy Like Us, documenting how Western psychiatric categories were actively reshaping the experience of mental distress in non-Western cultures. His case studies โ€” the spread of anorexia to Hong Kong, the export of PTSD to Sri Lanka after the tsunami, the marketing of depression in Japan โ€” illustrated a pattern that the anthropologist Arthur Kleinman had named decades earlier: the category fallacy. The category fallacy occurs when you take a diagnostic concept developed in one cultural context and apply it in another as though it were a universal truth, ignoring the ways in which the category itself is shaped by the culture that produced it. When Western psychiatrists diagnose "depression" in a Sri Lankan Buddhist who is experiencing dukkha โ€” the recognition of suffering that is, within Buddhist culture, not a symptom of illness but a sign of spiritual insight โ€” they are not discovering a disease. They are imposing a framework. As the anthropologist Gananath Obeyesekere argued, applying the category of depression to such experiences is itself a form of cultural violence โ€” not because the person isn't suffering, but because the classification organizes that suffering in ways that are foreign to the sufferer's own understanding and may foreclose the very forms of meaning-making that their culture provides.

The global mental health movement โ€” the effort to scale mental health services to low- and middle-income countries where the treatment gap is enormous โ€” has been the most recent and most well-intentioned vehicle for this export. The movement's foundational arguments are compelling: hundreds of millions of people worldwide suffer from mental health conditions and receive no treatment. Effective treatments exist. The barrier is a lack of trained providers and infrastructure. The solution is to train community health workers in simplified diagnostic and treatment protocols and deploy them at scale.

The psychiatrist Vikram Patel has been the movement's most influential advocate, and his manual Where There Is No Psychiatrist is itself a fascinating design artifact โ€” an attempt to create a simpler classification infrastructure that works without the DSM's complexity, optimized for use by non-specialists in low-resource settings. The ambition is genuinely humanitarian. But the critic Derek Summerfield and others have argued that the movement, despite its good intentions, performs a kind of epistemological colonialism โ€” flattening local systems of meaning and replacing them with Western categories that may be less culturally valid, less therapeutically useful, and less respectful of the communities they claim to serve. When the Global Burden of Disease study estimates mental illness prevalence worldwide, it does so using DSM and ICD categories. Conditions organized differently in other cultures โ€” conditions that don't map onto Western nosology โ€” become invisible in global health data. The infrastructure determines what gets counted, and what gets counted determines where resources flow.

The Chinese Classification of Mental Disorders โ€” the CCMD โ€” offers the most instructive case study in how this dynamic plays out. China is unique among non-Western traditions in having developed a formal psychiatric classification system that operated alongside the DSM and ICD rather than being absorbed by them. The CCMD evolved through several editions between 1979 and 2001, retaining culturally specific categories that the Western systems didn't recognize โ€” including shenjing shuairuo (neurasthenia), qigong-induced mental disorder, and travelling psychosis โ€” while omitting categories present in the DSM. The psychiatrist Sing Lee's analysis of the CCMD showed that it reflected Chinese cultural norms, values, and the political and economic organization of Chinese society. It was not a crude copy of the DSM with Chinese characteristics bolted on. It was a genuinely independent classificatory architecture.

And then the pressure to harmonize won. Under the gravitational pull of international research standards, pharmaceutical markets, and the institutional authority of the WHO, the CCMD was largely absorbed into the ICD system. The local classification infrastructure was decommissioned in favor of the global standard. Lee's comment on this process is worth sitting with: local classification systems like the CCMD "may offer an opportunity for needed reflections by North American psychiatrists who have simply taken the DSM-IV schema for granted." The opportunity, largely, was not taken.

The neurasthenia case is the microcosm. In the CCMD, neurasthenia โ€” shenjing shuairuo โ€” was a central diagnosis emphasizing somatic complaints and fatigue. It was conceptually distinct from psychiatric labels, less stigmatizing than "depression," and fitted with traditional Chinese medical epistemology in which disease is understood through disharmony of vital organs and imbalance of qi. In Western systems, neurasthenia was progressively absorbed into mood and anxiety disorders. Most patients who would receive a neurasthenia diagnosis in China would receive a depression or anxiety diagnosis under the DSM. Arthur Kleinman's ethnographic research showed that many Chinese patients diagnosed with neurasthenia responded to antidepressants โ€” suggesting some overlap with what the DSM calls depression โ€” but many did not, and the acceptance of a somatic framing was therapeutically important in ways that the Western diagnostic model couldn't capture.

Were these patients "really" depressed and misdiagnosed as neurasthenic? Or were they "really" neurasthenic, and the depression framework is the misdiagnosis? The question assumes that one classification must be right and the other wrong. The infrastructure lens suggests a different answer: both systems organize the same experiential territory differently, and the choice of framework has real consequences โ€” for what counts as illness, what counts as treatment, what counts as recovery, and what counts as data.


9c โ€” What Survives Translation

So is the whole enterprise incoherent? If every culture organizes suffering differently, if every classification carries the fingerprints of the culture that built it, does that mean there's no shared reality underneath โ€” that "depression" in Boston and yo'kwekyawa in Kampala and shenjing shuairuo in Beijing are just different words for different things, with no common referent? If so, the project of cross-cultural classification โ€” indeed, of cross-cultural communication about mental health โ€” collapses.

It doesn't collapse. But it's harder than it looks.

The anthropologist Jane Murphy, in a landmark 1976 study, found that Yoruba communities in Nigeria and Yupik communities in Alaska โ€” cultures with no historical contact โ€” both had indigenous concepts resembling what Western psychiatry calls schizophrenia. They identified severe mental disturbance, they had local categories for it, and their categories overlapped substantially with Western descriptions. Murphy argued that this suggested something genuinely universal in the recognition of severe psychotic states โ€” that at the extreme end of human suffering, cultures converge on similar recognitions even when they diverge on explanation and treatment.

A systematic review by Emily Haroz and colleagues in 2017 found a similar pattern for depression: both universal elements (sadness, social withdrawal, somatic complaints) and culturally specific ones (guilt prominent in Western samples, somatic focus in non-Western ones, spiritual explanations in many non-Western contexts). The picture that emerges is not one of pure universalism or pure relativism but of what cross-cultural psychologists call the etic/emic distinction: some features of human suffering appear across cultures (etic), while others are shaped by the cultural context in which they occur (emic). The interesting question is not which features are which โ€” that's an empirical project, ongoing and incomplete โ€” but what this distinction means for classification.

Here's where the infrastructure metaphor earns its final keep in this section. If you're designing a classification system, the etic/emic distinction presents a genuine design problem. You could optimize for the universal features โ€” build a classification around what's shared across cultures, accepting that you'll miss culturally specific presentations. This is roughly what the DSM and ICD attempt, and it produces a system that travels well (researchers in different countries can use the same categories) but fits badly (many people's suffering doesn't match the categories, and the categories miss forms of distress that are real but culturally particular). Or you could optimize for cultural specificity โ€” build classifications that fit each culture's understanding of its own suffering, accepting that you'll lose comparability across cultures. This is what the CCMD was, before it was absorbed.

Or โ€” and this is the option the DSM-5 tentatively gestured toward with its inclusion of "cultural concepts of distress" โ€” you could try to build a system that has both: a common framework with culturally specific modules. But this solution is less elegant than it sounds. If the common framework determines what's "real" and the cultural modules are supplements, you've just replicated the hierarchy โ€” Western categories as default, everything else as variation. If the cultural modules have equal standing, you've conceded that there's no single classification, which undermines the purpose of having one.

The psychiatrist and anthropologist Arthur Kleinman saw this coming decades ago. His concept of the category fallacy โ€” which will give Game 4 its name โ€” was not just a methodological critique. It was a claim about what classification systems are. They're not mirrors of nature. They're cultural technologies for making suffering legible โ€” for rendering the chaos of human distress into categories that institutions can act on. Different cultures have different institutions, different values, different ontologies, and so they produce different technologies. The question "which classification is correct?" is like asking "which language is correct?" โ€” it mistakes a practical tool for a description of reality.

But Kleinman was not a pure relativist, and the essay shouldn't be either. A note of intellectual honesty is required here: I should acknowledge that this essay has spent twelve sections taking the DSM apart while presenting non-Western frameworks with a generosity it hasn't extended to the Western system. That asymmetry is partly justified โ€” the DSM is the globally dominant infrastructure and therefore deserves the most scrutiny โ€” but it should not be mistaken for an endorsement of the empirical validity of al-Balkhi's categories, Ayurvedic constitutional theory, or indigenous healing frameworks. These traditions deserve the same critical analysis this essay has applied to the DSM; that analysis is beyond the present scope. People suffer. Some forms of suffering involve identifiable brain pathology. Some treatments work across cultures. The pragmatic question โ€” can we build classification infrastructure that serves people well across cultural contexts? โ€” is not answered by declaring all classifications culturally relative and going home. It's answered by building systems that are honest about what they know and what they don't, that distinguish their empirical content from their cultural assumptions, and that resist the temptation to mistake one tradition's way of organizing suffering for the way suffering is organized.

Nobody has built such a system. The obstacles are not merely scientific โ€” they're political, institutional, and commercial, as the previous section documented. The Western classification has the institutional weight of global research infrastructure, pharmaceutical markets, insurance systems, and legal frameworks behind it. Alternative systems have the weight of centuries of practice and the trust of the communities they serve, but they lack the institutional power to compete on the global stage. The result is not a fair competition between classification systems but an infrastructure monopoly, in which one system's categories become the default through institutional dominance rather than empirical superiority.

What the cross-cultural evidence makes clear โ€” and what the game you're about to play will make vivid โ€” is that the same human suffering can be organized into fundamentally different categories, each internally coherent, each making visible what the others obscure. The woman in Kampala whose suffering is called yo'kwekyawa, the man in Beijing whose suffering is called shenjing shuairuo, the patient in Boston whose suffering is called Major Depressive Disorder โ€” they may or may not be experiencing the same thing. But they are certainly being organized by different systems, and the systems they're organized by will shape their treatment, their identity, their relationship to their community, and their understanding of their own lives.

This is the deepest expression of what it means for classification to be infrastructure. The infrastructure doesn't just sort what's already there. It shapes what's there to be sorted. And when one infrastructure is exported globally โ€” carried by the institutional power of Western medicine, the economic logic of pharmaceutical markets, and the administrative needs of international health organizations โ€” it doesn't just classify the world. It remakes the world in its image.

You're about to experience this firsthand. Game 4 takes a single human experience โ€” a person in distress โ€” and runs it through multiple classification systems simultaneously. You'll see how the same suffering becomes different things under different frameworks, how each framework makes certain responses possible and others invisible, and how the choice of classification is never just a scientific decision. It's a decision about what kind of world you're building.

๐ŸŽฎ

Game 4: The Category Fallacy

Encounter a single human being in distress through multiple classification systems โ€” Western, Chinese, Ayurvedic, indigenous โ€” and watch the same suffering become different things under different frameworks.

Loading interactive experienceโ€ฆ

Section 10 โ€” Does Diagnosis Help?

Part IV: Does It Work? Should It Survive?

You've just experienced the category fallacy from the inside. Game 4 took a single human being in distress and ran them through multiple classification systems โ€” Western, Chinese, Ayurvedic, indigenous โ€” and you watched the same suffering become different things under different frameworks. Different causes, different mechanisms, different treatments, different meanings. Each system was internally coherent. None was obviously wrong. And the choice of system wasn't a neutral scientific decision โ€” it determined what the suffering was, what the person should do about it, and what kind of life the classification made available to them.

That experience was designed to make one thing vivid: classification is not discovery. It's architecture. The question is not which architecture is true but which architecture is useful โ€” useful for whom, useful for what, useful at what cost.

Part IV takes that question seriously. If the DSM's categories are constructed rather than found, if they carry the fingerprints of the culture that built them, if they function as infrastructure with all the path dependencies and political entanglements the previous sections have documented โ€” then the system's survival must rest on a pragmatic foundation. It must earn its place. And the most basic form that earning could take is this: does knowing someone's diagnosis help them get better?

This section asks that question, and the answer is more complicated โ€” and more unsettling โ€” than the system's defenders or its critics usually admit.


The clinical utility argument is the DSM's bedrock justification. Strip away the philosophical debates about natural kinds, the sociological critiques about medicalization, the cross-cultural challenges to universality โ€” strip it all away, and what remains is the claim that classification serves patients. You identify the disorder, you match it to the appropriate treatment, the patient improves. Diagnosis is to treatment what a mechanic's diagnosis is to repair: the necessary first step that determines what you do next. Without it, you're guessing.

This logic is so deeply embedded in clinical training, institutional design, and public understanding that questioning it can feel like questioning whether doctors should examine patients before treating them. Of course diagnosis helps. That's what diagnosis is for.

But "of course" is not evidence. And the treatment outcomes literature โ€” decades of it, involving hundreds of thousands of patients across hundreds of studies โ€” tells a story considerably more nuanced than the bedrock justification assumes.


Start where the evidence is strongest. For some conditions, diagnosis clearly and unambiguously guides treatment in ways that improve outcomes.

Bipolar disorder is the paradigm case. Lithium remains the gold standard for preventing manic and depressive episodes, with expert consensus as recently as 2024 confirming its first-line status. And lithium's clinical value depends entirely on getting the diagnostic distinction right. A patient with bipolar disorder who is misdiagnosed with unipolar depression and prescribed an antidepressant without a mood stabilizer may be tipped into mania โ€” the treatment indicated by the wrong diagnosis can actively harm the patient. The reverse is also consequential: lithium is not a first-line treatment for unipolar depression and carries a significant side-effect burden, so prescribing it to someone who doesn't need it exposes them to risks without corresponding benefit. Here, the diagnostic category does precisely what the bedrock justification promises. It separates two populations that look similar in their depressive episodes but require fundamentally different pharmacological management.

Psychotic disorders present a similar picture. Antipsychotic medications are clearly indicated for schizophrenia and related conditions, with meta-analyses confirming their efficacy for positive symptoms. The diagnosis tells you something actionable: this person needs antipsychotic medication, and withholding it in favor of psychotherapy alone for acute psychosis would be clinically irresponsible. Again, the category earns its keep.

Notice what these cases have in common. They're conditions where the DSM categories map most plausibly onto distinct biological mechanisms โ€” where the philosophical arguments about natural kinds come closest to holding, where the phenomenological boundaries are clearest, where the reliability data is strongest. These are the diagnoses that function most like diagnoses in the rest of medicine: identify the disease process, intervene on the mechanism, observe improvement. The diagnostic infrastructure, for these conditions, works the way it's supposed to.

The trouble is that these conditions account for a relatively small fraction of the people who walk into a clinician's office. The most common reasons people seek mental health treatment โ€” depression, anxiety, relational distress, the diffuse misery that doesn't fit neatly into any box โ€” are precisely the conditions where the evidence for diagnostic utility gets thin.


In 2010, a team led by Jay Fournier published a meta-analysis in JAMA that became one of the most cited findings in the treatment literature. They examined whether antidepressant medications outperformed placebo for depression, and they did โ€” but with a critical qualification. The benefit was concentrated almost entirely among the most severely depressed patients. For people with mild to moderate depression โ€” the majority of patients receiving the diagnosis โ€” antidepressants showed minimal advantage over placebo.

The implications are worth pausing on. Major Depressive Disorder, as the DSM defines it, is a single category. You either meet criteria or you don't. And the primary pharmacological treatment for that category is antidepressant medication. But the Fournier meta-analysis showed that the category doesn't predict treatment response in the way the system assumes. The dimensional variable โ€” how severe your depression is โ€” tells you far more about whether you'll benefit from medication than the categorical variable โ€” whether you meet criteria for the diagnosis at all. A person with severe depression benefits substantially from antidepressants. A person with mild depression who meets the same diagnostic criteria may not benefit more than they would from a sugar pill. The diagnosis captures both people in the same category and recommends the same class of treatment, but their expected treatment responses are radically different.

This is a measurement problem dressed as a clinical one. The infrastructure is designed to produce categorical outputs โ€” you have Major Depressive Disorder, or you don't โ€” when the clinically relevant variable is dimensional. It's as though a thermometer could only tell you "you have a fever" or "you don't," without telling you the actual temperature. The binary output loses exactly the information you need to make good treatment decisions.


If the pharmacological evidence is mixed โ€” strong for some conditions, weak for the most prevalent ones โ€” the psychotherapy evidence is more destabilizing still.

The psychologist Bruce Wampold has spent three decades assembling what is now the most comprehensive meta-analytic case in the treatment literature. His findings, developed across hundreds of studies involving thousands of patients and synthesized in The Great Psychotherapy Debate, decompose therapeutic outcome into its component sources. What predicts whether a person gets better in therapy? Wampold's answer, supported by an enormous body of evidence: not the specific brand of therapy, and not the specific diagnosis.

The numbers are striking. The specific treatment model โ€” whether you receive cognitive-behavioral therapy, psychodynamic therapy, interpersonal therapy, humanistic therapy, or any other established approach โ€” accounts for approximately one percent of outcome variance. One percent. The particular methodology the therapist is trained in, the particular manual they follow, the particular theory of pathology they operate from โ€” all of it, collectively, explains almost nothing about who gets better and who doesn't.

What does explain outcomes? The therapeutic alliance โ€” the quality of the working relationship between therapist and patient โ€” shows a correlation with outcome of approximately .27 across nearly two hundred studies, a medium-sized effect that dwarfs the contribution of any specific treatment ingredient. The therapist themselves โ€” their skill, their warmth, their ability to connect โ€” accounts for five to nine percent of outcome variance in naturalistic settings, far more than any treatment model. And a broad category of "common factors" โ€” elements shared across all therapeutic approaches, including empathy, agreement on goals, the patient's expectations, and the provision of a structured healing context โ€” collectively explains a substantial proportion of therapeutic change.

Wampold formalized this in what he calls the contextual model. In his account, therapy works through three pathways: the genuine human connection between therapist and patient, which creates a context where healing is possible; the creation of meaning, in which the patient develops a coherent explanation for their suffering and an expectation of improvement; and the engagement in health-promoting activities โ€” facing fears, restructuring thoughts, activating behavior โ€” that all structured therapies share, regardless of their theoretical orientation. The specific diagnosis matters to this model primarily as a vehicle for meaning-making. It gives the patient an explanation. But the explanation's therapeutic value comes not from its scientific accuracy but from its coherence and its capacity to generate hope. A different explanation, rooted in a different framework, might work just as well โ€” which is precisely what the cross-cultural evidence from the previous section would predict.


The Dodo bird verdict โ€” named after the Dodo in Alice in Wonderland who declares "everybody has won, and all must have prizes" โ€” is the shorthand for this finding in the psychotherapy literature. When you compare established therapies head to head, they tend to produce equivalent outcomes. The name is deliberately provocative, and the finding has been genuinely contentious since Saul Rosenzweig first proposed it in 1936. But nearly ninety years later, the broad pattern holds.

It holds most clearly for depression. When researchers control for allegiance effects โ€” the tendency for studies to favor the therapy the researcher is personally committed to โ€” and for publication bias, CBT's claimed superiority over other active treatments shrinks dramatically. The psychotherapy researcher Pim Cuijpers, in a comprehensive review in the Annual Review of Clinical Psychology, showed that CBT, psychodynamic therapy, interpersonal therapy, behavioral activation, and even non-directive supportive therapy all produce comparable outcomes for depression. The field's most studied disorder responds more or less equally well to every credible way of treating it. If the specific treatment doesn't matter, then the specific diagnosis that was supposed to guide you to the specific treatment doesn't matter either.

The Dodo bird holds for most anxiety disorders, though this remains contested โ€” some researchers, notably Pim Cuijpers and colleagues, have found small but nonzero specific treatment effects for depression and anxiety when controlling rigorously for study quality and researcher allegiance. It holds for mixed presentations โ€” patients with multiple co-occurring conditions, which is the majority in real clinical populations. It holds in the exact diagnostic territory where the DSM is most commonly applied and where the infrastructure of diagnosis-to-treatment matching is most heavily relied upon.

But there are exceptions, and they matter. The Dodo bird breaks down for specific phobias, where exposure-based treatments have clear superiority. It breaks down for obsessive-compulsive disorder, where exposure and response prevention outperforms other approaches. It breaks down for PTSD, where trauma-focused therapies show advantages over non-trauma-focused ones. And it breaks down for the pharmacological treatment of psychosis and bipolar disorder, as already discussed.

The pattern in the exceptions is the most interesting finding of all, and it maps directly onto the philosophical and psychometric arguments the essay has been building. The conditions where specific treatments outperform โ€” where diagnosis genuinely guides intervention โ€” are the conditions where DSM categories are most phenomenologically distinct, most biologically grounded, and most reliably identifiable. These are, in the language of Section 4, the conditions where the categories come closest to functioning as natural kinds โ€” or, as Kenneth Kendler would put it, where the causal mechanisms binding the symptom cluster are tight enough that the category tracks something real rather than imposing a convenient administrative boundary. The conditions where the Dodo bird holds โ€” where any reasonable treatment works about as well as any other โ€” are the conditions where DSM categories are most contested, most comorbid, and most dimensionally distributed. The diagnostic system's clinical utility varies in precise lockstep with the validity of its categories. Where the categories carve something real, diagnosis helps. Where the categories are pragmatic conveniences โ€” useful shorthands for capturing heterogeneous populations โ€” diagnosis adds little to treatment selection that wouldn't be captured by simpler dimensional measures.


The most direct empirical test of this question was conducted by David Barlow and colleagues, published in JAMA Psychiatry in 2017. Barlow's team designed a study that asked, with unusual directness: does matching treatment to diagnosis improve outcomes?

They took 223 patients with principal diagnoses of panic disorder, generalized anxiety disorder, obsessive-compulsive disorder, or social anxiety disorder โ€” four distinct DSM categories โ€” and randomized them to one of three conditions. The first group received a diagnosis-specific protocol: a treatment manual designed specifically for their particular disorder, incorporating the theoretical understanding and clinical techniques developed for that condition. This is what the diagnostic system assumes you should do โ€” identify the disorder, deploy the matched treatment. The second group received the Unified Protocol, a single transdiagnostic intervention that targeted shared underlying processes โ€” emotion dysregulation, neuroticism, avoidance โ€” rather than disorder-specific symptoms. One manual for four different diagnoses. The third group was a waitlist control.

The results were definitive. The Unified Protocol produced symptom reductions statistically equivalent to the diagnosis-specific protocols. Not just at the end of treatment โ€” at six-month follow-up, at twelve-month follow-up, and at thirty-six-month follow-up. Three years later, there was no clinically or statistically significant difference between getting the treatment matched to your diagnosis and getting a treatment that ignored your diagnosis entirely.

There is a sentence worth writing plainly, because its implications are enormous and the field has been slow to absorb them: for anxiety disorders, you don't need the specific diagnosis to select the right psychological treatment. One protocol, targeting processes that cut across diagnostic boundaries, works as well as four different manuals designed for four different categories. The diagnostic classification adds no treatment-selection value in this domain. Zero. The entire architecture of "identify the specific disorder โ†’ deploy the matched treatment" โ€” the infrastructure that justifies the diagnostic system's existence for these conditions โ€” performs no better than an approach that bypasses diagnosis altogether and goes straight to underlying mechanisms.


If common factors matter more than specific diagnoses for most psychotherapy outcomes, and if severity matters more than category for most pharmacological outcomes, then what should the clinical enterprise actually organize itself around?

The most developed answer to this question comes from Steven Hofmann and Steven Hayes, who proposed what they call process-based therapy in a 2019 paper that reads less like a clinical innovation and more like a paradigm challenge. Their core argument inverts the diagnostic system's logic. Instead of asking "What is the diagnosis? What is the matched treatment?" โ€” the question the DSM infrastructure was built to answer โ€” they propose asking: "What core biopsychosocial processes need to change in this person, given their goals, in their situation, and how can those processes be most efficiently changed?"

The shift from categories to processes is not cosmetic. It reorganizes the entire clinical encounter. Under the categorical model, the clinician's first task is classification: assign the patient to a diagnostic group. Treatment follows from group membership. Under the process model, the clinician's first task is functional analysis: identify the specific psychological processes โ€” rumination, avoidance, experiential disconnection, cognitive rigidity, interpersonal skill deficits โ€” that are maintaining this person's suffering. Treatment follows from the processes, not the category.

The distinction matters because processes don't respect diagnostic boundaries. Rumination appears in depression, generalized anxiety, PTSD, and obsessive-compulsive disorder. Avoidance is a maintaining factor in nearly every anxiety disorder, in depression, in substance use, in chronic pain. Cognitive rigidity appears everywhere. These are the transdiagnostic processes that Harvey and colleagues identified as early as 2004 โ€” the mechanisms that cut across the DSM's categorical divisions and that help explain why massive comorbidity rates aren't a bug in the system but a feature of how human suffering actually works. If the same processes underlie multiple "disorders," then treating the processes should work regardless of the diagnostic label โ€” which is exactly what the Barlow trial showed.

Process-based therapy is still young. The Process-Based Assessment Tool, developed by Ciarrochi and colleagues, is being tested. Idionomic approaches โ€” using intensive longitudinal data to model individual patients' dynamic process networks โ€” are technically promising but practically distant from everyday clinical use. Digital tools for collecting the data these approaches require are under development but not widely implemented. This is not, yet, a replacement infrastructure. It's a blueprint for one.

But the blueprint reveals something about the current infrastructure that the current infrastructure cannot see about itself. The DSM organizes clinical thinking around categories because that's what the DSM is โ€” a categorical system. Clinicians trained on the DSM learn to think categorically: this patient has depression, this patient has anxiety, this patient has both. The treatment literature organized around DSM categories tests treatments for depression, treatments for anxiety, treatments for comorbid depression and anxiety. The insurance system reimburses by diagnostic code. The research funding goes to studies organized by diagnostic category. The entire institutional apparatus assumes that categorical diagnosis is the natural starting point for clinical reasoning.

And yet the evidence suggests that the clinically relevant variables โ€” severity, specific psychological processes, therapeutic alliance quality, therapist skill โ€” are mostly dimensional, mostly transdiagnostic, and mostly invisible to the categorical infrastructure. The system optimizes for a variable (diagnostic category) that is weakly predictive of the outcome that matters most (treatment response), while ignoring variables (severity, process, alliance) that are strongly predictive. It's as though a city had organized its entire transportation infrastructure around postal codes โ€” building separate road systems for each zip code โ€” when the variable that actually determines traffic flow is distance.


The clinical utility picture, stated as clearly as possible, looks like this.

For a small number of conditions โ€” psychotic disorders, bipolar disorder, specific phobias, OCD, PTSD โ€” diagnostic classification meaningfully guides treatment in ways that improve outcomes. These are the conditions where the DSM categories are most likely to reflect genuine distinctions in underlying mechanisms, where the reliability evidence is strongest, and where the philosophical arguments about natural kinds are most plausible. For these conditions, the diagnostic infrastructure works. Not perfectly โ€” the categories are still cruder than the underlying reality โ€” but well enough that getting the diagnosis right makes a material difference in what happens to the patient.

For the most prevalent conditions โ€” mild to moderate depression, generalized anxiety, adjustment difficulties, the diffuse suffering that fills most therapists' caseloads โ€” diagnostic classification adds little or no value to treatment selection. The Dodo bird verdict holds. Severity matters more than category. Process matters more than label. The therapeutic relationship matters more than the therapeutic technique, which in turn matters more than the diagnostic framework that was supposed to determine the technique. For these conditions, the diagnostic infrastructure is not harmful โ€” it provides meaning, it unlocks access to services, it creates a shared language between clinician and patient โ€” but its clinical utility lies in those social and administrative functions, not in the treatment-matching function that is supposed to be its primary justification.

And here is the structural insight that connects this section to everything the essay has built: the system treats diagnostic utility as uniform when it is actually wildly variable. The DSM contains conditions where diagnosis is clinically essential and conditions where it is clinically inert, and it presents them all in the same format, with the same epistemic authority, as though the relationship between diagnosis and treatment were the same across the manual. There is no metadata, no signal to the clinician, that says: for this condition, the diagnosis is driving the treatment decision; for this condition, it isn't. The infrastructure's uniform architecture obscures the non-uniform reality underneath.


There is an emerging field โ€” precision psychiatry โ€” that represents the most ambitious attempt to solve the treatment-matching problem computationally. Machine learning algorithms trained on large datasets try to predict which individual patients will respond to which treatments, using symptom profiles, severity scores, personality variables, genetic markers, and neuroimaging data as inputs. The promise is real: if categorical diagnosis is too crude for treatment matching, perhaps algorithms can find the patterns that categories miss.

But precision psychiatry's very existence is an implicit concession. If DSM categories worked for treatment selection โ€” if the diagnostic infrastructure delivered on its bedrock promise โ€” there would be no need for machine learning algorithms to figure out who benefits from what. The field is trying to solve computationally what the diagnostic system was supposed to solve categorically. And the irony deepens: the algorithms that perform best tend to use individual-level features โ€” specific symptom profiles, dimensional severity scores, personality characteristics โ€” rather than diagnostic categories as their inputs. The categories the system is built on turn out to be among the least useful predictors in the very models designed to rescue the system's clinical utility.

A 2024 review in Psychological Medicine identified the field's persistent challenges with bracing honesty: studies still rely on retrospective data and unrealistic outcome definitions, placebo effects are rarely modeled, fairness and equity considerations are understudied, and prospective validation against current clinical practice is essentially absent. The promise of precision psychiatry remains a promise. The infrastructure it would require โ€” the longitudinal data collection, the validated biomarkers, the computational tools accessible to working clinicians โ€” doesn't exist at scale and won't for years. In the meantime, the old infrastructure remains in place, doing what infrastructure does: persisting not because it's optimal but because the switching costs are enormous and the replacement isn't ready.


I want to end this section by naming what's hardest about the clinical utility evidence, because the easy version โ€” diagnosis is useless, throw out the DSM โ€” is as wrong as the comfortable version it replaces.

The difficult truth is that diagnostic classification serves multiple functions simultaneously, and its utility for treatment matching is only one of them. When a person in distress receives a diagnosis, several things happen at once. They gain a name for their suffering โ€” and as Section 3 documented, the act of naming can be therapeutic in itself, providing coherence, reducing shame, creating a narrative framework for an experience that felt chaotic and isolating. They gain access to services โ€” insurance coverage, disability accommodations, support groups, an entire institutional infrastructure that requires a diagnostic key to enter. They gain membership in a community of people who share their diagnosis and who have built identities, advocacy organizations, and mutual support networks around it. And they gain, sometimes, a treatment recommendation that is genuinely matched to their condition and that wouldn't have been available without the diagnostic distinction.

The problem is that these functions are bundled together in a single system. You can't get the access-granting function without the categorizing function. You can't get the meaning-making function without the reification risk. And the treatment-matching function โ€” the one that provides the system's scientific legitimacy โ€” varies enormously across conditions while the other functions remain constant. The patient with mild depression gets the same kind of diagnosis, the same kind of billing code, the same kind of treatment recommendation, as the patient with bipolar disorder โ€” but for the first patient, the treatment-matching value is near zero, while for the second, it's clinically essential.

This is the infrastructure trap in its clinical form. The system persists not because every part of it works but because the parts that work and the parts that don't are wired into the same infrastructure, and you can't remove the non-functional components without disrupting the functional ones. Telling a clinician that diagnosis doesn't predict psychotherapy outcomes for depression doesn't change the fact that the patient needs a diagnosis to get insurance to cover the psychotherapy. Telling a researcher that dimensional severity predicts treatment response better than categorical diagnosis doesn't change the fact that research funding, journal publication, and clinical training are all organized around categories. The evidence says one thing. The infrastructure requires another. And the infrastructure, as always, wins.

What this section has shown is that the DSM's primary clinical justification โ€” diagnosis guides treatment โ€” holds robustly for perhaps twenty percent of the conditions it covers and weakly or not at all for the rest. This doesn't mean the system is useless. It means the system's utility lies somewhere other than where the system claims it lies. The DSM works โ€” when it works โ€” as social infrastructure: a shared language, a gatekeeping mechanism, a meaning-making technology, an administrative necessity. Its scientific utility โ€” the part that distinguishes it from any other institutional sorting system โ€” is real but far more limited than the system's architecture implies.

The next section asks what happens when this system, with its variable utility and its categorical confidence, is applied to the most vulnerable population of all: children whose identities are still forming, whose trajectories are still open, and whose relationship to a diagnostic label will shape decades of development. If the stakes of the clinical utility question feel abstract for adults who can evaluate their own diagnoses with some critical distance, they become concrete and urgent when the person being classified is seven years old.

Section 11 โ€” The Developmental Lens

The previous section asked whether diagnosis helps. The evidence was more equivocal than most clinicians assume โ€” strong for some conditions, weak for many others, and complicated everywhere by the finding that common therapeutic factors may matter more than diagnostic specificity. But that entire discussion shared an unstated premise: it was about adults. Adults who present to clinicians with established patterns of suffering. Adults whose brains have finished developing. Adults who can, at least in principle, participate in the classification process as informed agents โ€” agreeing or disagreeing, accepting or rejecting, metabolizing the label through an already-formed identity.

Now consider a seven-year-old.

A seven-year-old has no established patterns. Her brain will continue developing for another fifteen years. She cannot meaningfully consent to being classified, cannot evaluate the evidence for or against a diagnosis, cannot weigh the institutional consequences of the label she's about to receive. And she has, in front of her, decades during which that label will shape how her parents understand her, how her teachers respond to her, what medications she takes, what stories she tells about herself, and what she believes she's capable of becoming.

Everything this essay has analyzed โ€” the boundary problem, the looping effects, the institutional capture, the reification of categories โ€” runs on fast-forward when the person being classified is a child. The stakes are higher, the subject is more vulnerable, and the time horizon over which the classification does its work is vastly longer. If psychiatric classification is infrastructure, then classifying children is building infrastructure through a developing neighborhood โ€” and the construction itself changes what the neighborhood becomes.


The sharpest case study in the entire essay comes from child psychiatry, and it begins with a single reinterpretation.

In the mid-1990s, the child psychiatrist Joseph Biederman and his colleagues at Massachusetts General Hospital published a series of papers arguing that the psychiatric establishment was systematically under-recognizing bipolar disorder in children. The DSM's criteria for bipolar disorder required discrete episodes of elevated mood โ€” mania, the defining feature โ€” alternating with depression. But children, Biederman argued, didn't present that way. Their mania was chronic rather than episodic, and it looked like irritability rather than euphoria. What clinicians were seeing as severe ADHD, oppositional defiance, or generalized behavioral dysregulation was, in Biederman's view, undiagnosed bipolar disorder.

This was not a minor reinterpretation. It was a wholesale redefinition of what mania meant when applied to children. And it had consequences that cascaded through every layer of the infrastructure.

Carmen Moreno and her colleagues documented those consequences in a landmark 2007 study. They analyzed national outpatient data and found that the diagnosis of bipolar disorder in youth had increased approximately forty-fold between 1994 and 2003. Not four-fold. Forty. In less than a decade, a diagnosis that had been rare in children became one of the most rapidly growing diagnostic categories in American child psychiatry. The children receiving this diagnosis were prescribed antipsychotic medications and mood stabilizers โ€” serious drugs with significant metabolic side effects, including weight gain, diabetes risk, and movement disorders โ€” because the classification told clinicians they were treating bipolar disorder, and bipolar disorder requires mood stabilizers.

Pause on what happened here. The scientific evidence base for childhood bipolar disorder was thin. What existed was a conceptual argument โ€” that if you redefined mania to include chronic irritability, then many more children met criteria. The redefinition was made by a small number of influential clinicians. It propagated through the system not through replicated research but through the normal channels of medical infrastructure: training programs, clinical guidelines, continuing education, pharmaceutical marketing, and the institutional logic of a classification system that rewards specificity. Once the category existed in practice โ€” once clinicians had been trained to see it, once pharmaceutical representatives had materials explaining it, once parents had heard the term and recognized their children in it โ€” it became self-sustaining. The infrastructure took over.

Then the correction came. The neuroscientist Ellen Leibenluft, working at the National Institute of Mental Health, conducted the longitudinal research that Biederman's conceptual argument had lacked. She followed chronically irritable children over time and found something that should have been anticipated but wasn't: they did not grow up to be bipolar adults. They developed depression and anxiety. The chronic irritability that had been reclassified as pediatric mania was not, in fact, early-onset bipolar disorder. It was something else โ€” something the existing classification didn't have a name for, which was part of why it had been shoehorned into a category that didn't fit.

Leibenluft's data led directly to the creation of Disruptive Mood Dysregulation Disorder โ€” DMDD โ€” in the DSM-5, published in 2013. The new category was designed explicitly to absorb the children who had been misclassified as bipolar. It was a patch, applied to correct an error that the classification system had amplified.

But here's what the infrastructure metaphor predicts, and what happened: the patch introduced new bugs. Gabrielle Carlson's subsequent assessment of DMDD found that the new category had low reliability โ€” clinicians couldn't agree on who had it โ€” and high comorbidity with other diagnoses, particularly ADHD and oppositional defiant disorder. Its validity was uncertain, its boundaries were unclear, and it wasn't obvious that it identified a distinct condition rather than a severity marker for problems already captured by existing categories. The fix for one classificatory error had created a new classificatory problem. In software engineering, this is a familiar pattern. In child psychiatry, the costs are measured in children's developmental trajectories.

The pediatric bipolar story is not an anomaly. It's an illustration of how the classification infrastructure operates when applied to developing organisms. The same dynamics โ€” redefinition, diagnostic expansion, institutional uptake, eventual correction, imperfect patching โ€” have played out, with variations, across multiple childhood diagnoses. But to see the deeper structural problem, we need to look at the case that's been running longest and remains the most contested.


ADHD is the paradigmatic case of childhood classification because it makes visible every tension this essay has examined, and it makes them visible simultaneously.

Start with the boundary problem. ADHD sits on a continuum with normal childhood behavior. Attention exists on a spectrum. Activity level exists on a spectrum. Impulsivity exists on a spectrum. Somewhere on each of these spectra, a clinician draws a line and says: above this, normal; below this, disorder. The philosopher of science Ilina Singh has argued โ€” in what remains the most nuanced treatment of the ADHD controversy โ€” that the right position is neither "ADHD is a myth" nor "ADHD is a brain disease." ADHD is a real phenomenon. Some children genuinely struggle with attention and impulse control in ways that cause significant suffering and impairment. But the boundaries of the category are socially negotiated. Where you draw the line determines who has ADHD, and where you draw the line is influenced by cultural expectations, institutional incentives, and practical needs that have nothing to do with the neuroscience.

The evidence for this claim is extensive and discomfiting. The systematic review led by Martin Whitely confirmed what's now called the birthday effect: the youngest children in a school grade are significantly more likely to receive an ADHD diagnosis than the oldest. Being developmentally immature relative to your classmates โ€” being a young five-year-old in a room of old five-year-olds โ€” is being misread as pathology. The classification system cannot distinguish developmental immaturity from disorder when it's applied in age-normed educational contexts, because the assessment happens in the classroom, where the comparison group is the grade, not the birth cohort.

Stephen Hinshaw and Richard Scheffler documented a broader pattern. ADHD diagnosis rates in the United States vary enormously by geography, tracking educational accountability policies and special education funding structures more closely than they track any plausible variation in the underlying condition. States that adopted high-stakes testing regimes saw increases in ADHD diagnosis. States with more generous special education funding saw more children classified. The diagnostic rates correlate with institutional incentives โ€” schools that benefit from classifying children classify more children โ€” in ways that are difficult to reconcile with the premise that diagnosis is simply recognizing a pre-existing condition.

This is the classification-as-key phenomenon from Section 8, expressed developmentally. An ADHD diagnosis unlocks school accommodations, extended test time, individualized education plans, and in many cases stimulant medication. Without the diagnosis, none of these resources are accessible. The sociologist Adam Rafalovich traced the feedback loop: teachers identify problem behavior, parents seek explanations, clinicians apply categories, and the educational system provides incentives for diagnosis independent of clinical need. The classification enters the school as a social fact and restructures the child's world around it.

But here's where the developmental lens reveals something that the adult-focused literature misses. A child classified with ADHD at age seven doesn't just receive accommodations and medication for a few years. The classification becomes part of her identity during the period when identity is being formed. Juho Honkasilta and colleagues studied how adolescents with ADHD narrate their own behavior and found three patterns: some adopt the diagnostic framework and pathologize their own actions, explaining everything through the lens of their condition. Some reject the diagnosis and condemn themselves morally โ€” if it's not a disorder, then the failures must be their fault. Some use the diagnosis to liberate themselves from moral blame โ€” it's not me, it's my ADHD. These are three fundamentally different identity positions, all produced by the interaction between a developing person and a classificatory label. The label doesn't land on a finished self. It participates in building the self.

This is Hacking's looping effect โ€” the phenomenon Section 5 explored โ€” operating at developmental speed. An adult diagnosed with depression can assimilate the label into an existing self-concept. A child diagnosed with ADHD assimilates the label during the period when self-concept is being constructed. The category doesn't just describe the child. It becomes one of the materials out of which the child's identity is built.

Peter Conrad and Deborah Potter traced the logical endpoint of this process. ADHD, originally a childhood behavior disorder with the expectation that children would grow out of it, has expanded into a lifelong condition โ€” a permanent identity marker that follows people from childhood into adulthood. This expansion was driven not by the discovery that ADHD persists but by pharmaceutical marketing, adult self-diagnosis, and the institutional utility of the category across the lifespan. And then Terrie Moffitt and her colleagues, working with the Dunedin longitudinal cohort, produced a finding that destabilizes the whole framework: most adults who meet criteria for ADHD did not have ADHD as children, and most children with ADHD did not meet criteria as adults. Childhood ADHD and adult ADHD may not be the same condition at all. They may be different things sharing a label โ€” a classificatory artifact produced by the assumption that a childhood diagnosis names a stable entity that persists across the lifespan.

The classification system assumes continuity that the longitudinal data doesn't support.


This insight extends far beyond ADHD. The developmental psychopathologist William Copeland and colleagues followed children from childhood through early adulthood and found that most who received a psychiatric diagnosis in childhood did not retain the same diagnosis as adults. They either remitted entirely or transitioned to a different diagnostic category. The classification system treats a childhood diagnosis as identifying a stable condition โ€” the child has depression, has anxiety, has ADHD โ€” but the developmental reality is one of flux and transition. Labeling a child with a diagnosis implies a permanence that the developmental trajectory doesn't warrant.

Julia Kim-Cohen and colleagues approached from the other direction โ€” tracing adult disorders back to childhood โ€” and found the complementary result. Most adults with mental disorders had received childhood diagnoses, but often different childhood diagnoses. The pathway from childhood classification to adult disorder runs through diagnostic transitions, not diagnostic stability. A child classified with anxiety may become an adult classified with depression. A child classified with conduct disorder may become an adult classified with antisocial personality. The classification system takes snapshots, but development is a movie.

This is the core insight of developmental psychopathology as a field โ€” articulated most clearly by Michael Rutter and L. Alan Sroufe โ€” and it represents a fundamental challenge to categorical classification. Developmental processes are dynamic, context-dependent, and probabilistic. The same early risk factor can lead to many different outcomes depending on the intervening environment โ€” what developmental psychopathologists call multifinality (recall from Section 6: one cause, many possible results). And the same outcome can be reached through many different developmental pathways โ€” equifinality (many causes, same result). Categories freeze what is fluid, decontextualize what is contextual, and dichotomize what is continuous. The classification system, designed for adult presentations that are presumed to be relatively stable, struggles with the fundamental nature of development itself.

The Dunedin cohort study โ€” the same longitudinal project that contributed the p-factor research discussed in Section 7 โ€” produced perhaps the most sobering finding. Avshalom Caspi and colleagues, following participants across four decades, found that by midlife the vast majority of people had met criteria for at least one psychiatric disorder, and those who met criteria for one disorder usually developed others over time. If nearly everyone qualifies at some point, and if diagnoses shift and transform across the lifespan, then the classificatory system may be pathologizing the human condition rather than identifying discrete disorders. This doesn't mean suffering isn't real. It means the system that sorts suffering into categories is telling a story about discrete diseases that the longitudinal evidence doesn't support.


And the problem runs in both directions along the lifespan. The DSM's categories were built, overwhelmingly, on studies of younger and middle-aged adults. When those categories are applied to elderly patients, the fit deteriorates in predictable ways. Dan Blazer's review of late-life depression documented what geriatric psychiatrists have long observed: depression in older adults presents differently โ€” more somatic complaints, more cognitive impairment, more entanglement with physical illness and grief. The DSM's criteria, developed on younger populations, may systematically misclassify late-life depression, missing it when it's present and diagnosing it when what's present is grief, loneliness, or the reasonable demoralization of chronic illness.

Dilip Jeste and his colleagues identified the same pattern for late-onset psychosis. Psychotic symptoms that emerge in older adulthood have different phenomenology, different course, and different treatment response than the schizophrenia the DSM was built to classify โ€” but they receive the same diagnostic label, because the classification has no mechanism for distinguishing early-onset from late-onset presentations in a clinically meaningful way. The infrastructure is insensitive to developmental timing. It was designed for one stage of the lifespan and applied, with diminishing fit, across all of them.


The developmental lens does not invalidate psychiatric classification. Some children are genuinely helped by a diagnosis โ€” it names their struggle, connects them to resources, and reduces the moral blame that otherwise falls on them and their families. The point is not that childhood classification is always wrong. The point is that it is always consequential in ways that adult classification is not, because it operates on an organism that is still becoming itself, within institutions โ€” families, schools, peer groups โ€” that will reorganize around the label. And the infrastructure was not designed with this in mind.

The pediatric bipolar catastrophe was not caused by bad clinicians. It was caused by a classification system that had no built-in safeguards against the amplification of diagnostic error when applied to children โ€” no mechanism for flagging that a redefinition of mania was being applied to a population for which it had never been validated, no institutional circuit breaker that could have slowed the cascade before hundreds of thousands of children were prescribed antipsychotics for a condition they didn't have. The birthday effect in ADHD diagnosis is not caused by lazy teachers. It's caused by a classification system that cannot distinguish developmental variation from pathology when embedded in an educational system that uses diagnostic categories as resource-allocation tools.

These are infrastructure failures. They are failures of the system, not the people operating it. And they carry a particular moral weight because the people most affected โ€” children โ€” are the ones least able to advocate for themselves within the system, least able to evaluate the classification they've been given, and most transformed by it.

The neurodiversity movement has begun to offer an alternative framework โ€” one that reinterprets conditions like ADHD and autism not as disorders to be treated but as variations to be accommodated, shifting the classificatory logic from deficit to difference. Louise Eccleston and colleagues found that adults with ADHD strategically navigate between the medical model and the neurodiversity framework, using whichever serves them better in a given context. This is sophisticated. It is also an indictment of the system: when the people being classified have to maintain two competing interpretive frameworks and switch between them depending on institutional context, the classification is not doing its job.

What the developmental lens reveals, ultimately, is a design flaw so fundamental that it's easy to mistake for a feature. The DSM classifies conditions as if they were properties of individuals at a moment in time. Development insists that they are trajectories โ€” probabilistic, context-dependent, shaped by the very act of classification โ€” unfolding within systems that respond to the label as much as to the person. The classification captures a snapshot and treats it as a portrait. But people โ€” especially young people โ€” are not still.

You've now seen the system from nearly every angle this essay can offer: its history, its philosophy, its biology, its measurements, its social entanglements, its cultural parochialism, its therapeutic track record, and its developmental consequences. The next experience will bring it home to the most personal level. Game 5 puts you on the receiving end. You'll be classified โ€” and you'll feel, in your own decision-making, how the label changes what you can do, what you believe about yourself, and what futures seem possible. The infrastructure won't be abstract anymore. It will be yours.

๐ŸŽฎ

Game 5: Sorted

Experience classification from the receiving end. Be diagnosed โ€” and feel how the label reshapes what you can do, what you believe about yourself, and what futures seem possible.

Loading interactive experienceโ€ฆ

Section 12 โ€” Classification as Infrastructure: A Synthesis

Part V: Toward the Next Architecture

You've just been sorted.

Game 5 put you on the receiving end of a classification system, and if it did what it was supposed to do, you felt something that the previous eleven sections have been circling analytically: the strange, double-edged quality of being named. The diagnosis opened doors โ€” resources, recognition, a community of others who share your label. And it closed them โ€” narrowing the story you could tell about yourself, shaping how others treated you, making certain futures more visible and others less so. You experienced, in compressed form, what millions of people experience over years: the ambivalence of being seen through a system that is simultaneously a tool of care and a technology of constraint.

Now the essay does what it has been building toward since the first paragraph. It takes the metaphor it opened with โ€” classification as infrastructure โ€” and assembles it fully. Not as a clever analogy, but as a precise analytical claim.


In 1999, the information scientist Susan Leigh Star published an essay called "The Ethnography of Infrastructure" in which she argued that classification systems, like roads and sewage lines and electrical grids, are infrastructure in a technical sense that has specific, identifiable properties. Star and her collaborator Geoffrey Bowker had spent years studying what happens when you take classification seriously โ€” not as a philosophical problem about whether the categories are right, but as an empirical question about what the categories do. Their method was distinctive: rather than asking whether classification systems were correct, they asked how classification systems shaped the worlds they appeared merely to sort. They examined the International Classification of Diseases, South African racial classification under apartheid, the Nursing Interventions Classification. Each case taught them something different. Together, the cases yielded a framework: eight properties that all infrastructure shares, from water mains to library catalogs to the system that determines whether your suffering qualifies for insurance coverage.

I've been calling the DSM infrastructure since this essay's opening sentence. It's time to make good on the claim. What follows is Star's eight properties of infrastructure, applied systematically to psychiatric classification, drawing on everything the previous eleven sections and five games have built. If the metaphor holds โ€” if psychiatric classification genuinely exhibits all eight properties โ€” then the implications are specific and uncomfortable. Because the most important thing about infrastructure isn't whether it's right. It's that it persists.


Embeddedness. Infrastructure is sunk into other structures, social arrangements, and technologies. It doesn't stand alone. It's woven into the fabric of everything that uses it.

The DSM doesn't exist as a freestanding scientific document. It exists inside insurance billing systems, where its categories are mapped to CPT codes that determine reimbursement. It exists inside legal proceedings, where its categories define the boundaries of competency, diminished capacity, disability, and involuntary commitment. It exists inside pharmaceutical regulation, where its categories determine the conditions that drugs are approved to treat, which in turn determines how clinical trials are designed, which in turn shapes what kinds of suffering the research enterprise can even see. It exists inside electronic health record systems, where its categories structure the dropdown menus and checkbox forms through which clinicians document their work. It exists inside school systems, where categories like ADHD and autism spectrum disorder unlock Individualized Education Programs and classroom accommodations. It exists inside the military, where diagnostic categories determine fitness for service and eligibility for VA benefits. It exists inside immigration law, where certain diagnoses can render a person inadmissible.

Section 8 documented this embedding in detail โ€” the way the classification functions as gatekeeper, resource allocator, and social sorting mechanism. But the infrastructure concept adds something that the sociological analysis alone does not: the observation that embedding is what makes the system nearly impossible to change. You can't modify the classification without simultaneously modifying every system it's embedded in. Change the criteria for major depressive disorder and you've changed what insurance covers, what drugs are indicated, what legal precedents apply, what research populations mean, what clinicians were trained to recognize, and what patients have organized their self-understanding around. The DSM is not a book. It's a load-bearing wall.


Transparency. Infrastructure is invisible in routine use. You look through it at whatever it delivers, and the system itself disappears.

This was the essay's opening move โ€” the faucet, the water, the six million miles of pipe nobody thinks about. When a psychiatrist diagnoses someone with generalized anxiety disorder, neither the clinician nor the patient typically pauses to consider the committee that defined the criteria, the field trials that tested the thresholds, the insurance codes that flow from the label, the treatment guidelines that were written around the category, the pharmaceutical marketing campaigns that shaped public understanding of what anxiety is, or the century of philosophical debate about whether the category names something real. They just use it. The diagnosis is a window; they look through it at the patient's suffering.

Section 3 was an attempt to crack this transparency โ€” to describe what the diagnostic encounter actually feels like from both sides and to reveal the gap between the manual's tidy categories and the room's messy reality. The ethnographic research on clinical decision-making showed that experienced clinicians often diagnose intuitively, matching patients to prototypes they've internalized rather than walking through criterion checklists. The manual becomes transparent in a second, deeper sense: clinicians have absorbed it so completely that they no longer experience themselves as applying a system. They experience themselves as seeing what's there. This is infrastructure at its most effective and most dangerous โ€” when the tool becomes invisible, its assumptions become invisible too.


Reach or scope. Infrastructure extends beyond a single event or one-site practice.

The DSM's reach is global, and Section 9 traced the scope of that reach with the care the subject deserves. Through the ICD concordance, through the global mental health movement, through pharmaceutical marketing, through the sheer institutional weight of American psychiatry, the DSM's categories have been exported to virtually every country on earth. The same system that was designed by American committees working from American clinical populations now organizes mental health research in Lagos, clinical practice in Shanghai, and public health surveillance in Sรฃo Paulo. When the CCMD-3 was retired and Chinese psychiatry adopted the ICD-11, that was a moment of convergence โ€” one more national system pulled into the orbit of the Western classification, one more set of culturally specific categories replaced by a global standard.

But reach isn't only geographic. The DSM reaches across domains of social life that its designers never anticipated. It reaches into child custody disputes, where a parent's diagnosis can determine custody outcomes. It reaches into employment law, where diagnostic categories define what counts as a disability requiring accommodation. It reaches into popular culture, where diagnostic terms like "narcissist," "OCD," "bipolar," and "ADHD" have escaped the clinical context entirely and become folk categories for everyday social sorting. The reach of the classification has outrun the reach of the expertise behind it.


Learned as membership. Users learn the classification system as part of being socialized into a community. Becoming a member means learning to see through the infrastructure.

Medical students don't learn the DSM the way they learn anatomy โ€” as a body of facts to memorize. They learn it the way children learn language โ€” as the medium through which a particular community makes the world legible. A psychiatric resident learns to see a specific configuration of sadness, fatigue, sleep disturbance, and concentration difficulty and recognize it as Major Depressive Disorder the way a sommelier learns to taste a wine and name the grape. The categories become perceptual habits. The system colonizes professional vision.

But membership isn't limited to professionals. Section 5 documented how patients learn to see themselves through diagnostic categories โ€” how a diagnosis reorganizes self-understanding, provides a narrative framework, connects the person to a community of others with the same label. Section 11 examined what happens when this membership learning begins in childhood, when a seven-year-old starts to understand herself as "someone with ADHD" and that understanding becomes the lens through which she interprets her own behavior, her school performance, her social relationships, and her future possibilities. The looping effects Hacking described are, in part, effects of membership โ€” you learn the system, and the system learns you back.

And this cuts both ways. Learned-as-membership means that replacing the classification would require unlearning โ€” not just updating a reference manual but retraining the perceptual habits of hundreds of thousands of clinicians and revising the self-understanding of millions of patients. The infrastructure lives inside people's heads, and you can't upgrade that with a software patch.


Links with conventions of practice. Infrastructure shapes and is shaped by the practices it organizes. It both codifies existing conventions and constrains future ones.

Treatment guidelines are written around diagnostic categories. Clinical trials are designed around diagnostic categories. Training programs teach around diagnostic categories. Insurance authorization procedures are structured around diagnostic categories. The practices of mental health care have co-evolved with the classification system until the two are nearly inseparable โ€” like a river and its banks, each shaping the other.

Section 10 showed this link at its most consequential. Treatment guidelines say: if Major Depressive Disorder, then SSRIs and/or CBT. If Bipolar I, then mood stabilizers. If Schizophrenia, then antipsychotics. The common factors research complicates this picture considerably โ€” the evidence that therapeutic alliance, expectancy, and other non-specific factors often matter more than the specific intervention matched to the specific diagnosis. But the practice conventions remain organized around the diagnostic categories regardless, because the conventions need categories to organize around. A therapist working with a patient in distress doesn't have the institutional option of saying "I'm going to treat you as a whole person in context" โ€” she has to enter a diagnosis in the EHR, which generates a treatment plan template organized around that diagnosis, which gets reviewed by insurance for medical necessity based on whether the proposed intervention matches the documented condition. The conventions of practice don't just use the classification. They enforce it.

And the link runs in both directions. When a new treatment emerges โ€” as happened with ketamine for treatment-resistant depression โ€” the classification system shapes how the treatment is conceptualized, tested, and deployed. The treatment is studied in patients who meet criteria for a specific DSM category. It's approved by regulators for that category. It's marketed to clinicians as a treatment for that category. The classification determines which patients get access, which research questions get asked, and which forms of suffering the treatment is even imagined as relevant to. The infrastructure doesn't just organize existing practice. It determines what practices become thinkable.


Embodiment of standards. Classification systems plug into and are determined by other standards and their reach.

The DSM plugs into the ICD, which plugs into the WHO's global health reporting framework, which plugs into national public health surveillance systems, which plug into health policy and funding decisions. It plugs into the FDA's drug approval process, which requires demonstration of efficacy for specific DSM-defined conditions. It plugs into the insurance industry's utilization review standards, which require DSM diagnoses to authorize care. It plugs into the APA's clinical practice guidelines, into the NIMH's research funding priorities, into the legal profession's standards for expert testimony.

Each of these connections is itself a standard with its own institutional weight, its own inertia, its own stakeholders. The DSM doesn't just embody one standard โ€” it embodies a web of interlocking standards, each reinforcing the others. Section 6 showed what happens when a rival standard โ€” RDoC โ€” tries to offer an alternative: even with the full institutional backing of the NIMH, RDoC couldn't dislodge the DSM because it couldn't plug into the other standards. RDoC has no billing codes. It can't authorize treatment. It can't structure a clinical trial that the FDA will accept. It can't organize a disability claim. It's a standard that doesn't connect to the other standards, which makes it, in infrastructure terms, an orphan โ€” architecturally elegant but practically uninstalled.


Built on an installed base. Classification systems inherit the strengths and limitations of whatever preceded them. New infrastructure doesn't start from scratch; it grows out of the old.

Section 2 traced this inheritance in detail. DSM-III didn't emerge from a scientific vacuum. It was built on โ€” and against โ€” the psychoanalytic framework of DSM-I and DSM-II. It inherited certain categories (schizophrenia, the affective disorders) while rejecting the etiological assumptions that had organized them. DSM-IV refined DSM-III's categories but was structurally constrained by the research literature that had accumulated around those categories โ€” you can't radically revise "Major Depressive Disorder" when twenty years of clinical trials, epidemiological studies, and treatment guidelines have been built around that specific construct. DSM-5 attempted more ambitious revisions โ€” dimensional approaches to personality disorders, the bereavement exclusion removal, a reorganization of the chapter structure โ€” but was pulled back by the weight of the installed base. The personality disorder revisions were so controversial they were relegated to an appendix. The installed base won.

This is what engineers call path dependency, and it's the feature of infrastructure that most directly constrains the future. Each version of the DSM narrows the space of possible next versions, because each version generates new research, new clinical practices, new institutional arrangements, and new patient identities that the next version must either accommodate or painfully disrupt. The system doesn't just carry its history. It's trapped in it.


Becomes visible upon breakdown. When infrastructure works, it disappears. It becomes visible only when something goes wrong.

The DSM becomes visible when a patient's suffering doesn't fit any available category โ€” when the clinician reaches for a diagnosis and finds only "Other Specified" or "Unspecified," those residual bins that Section 7 showed are among the most commonly used diagnoses in clinical practice. It becomes visible when two clinicians evaluate the same patient and arrive at different diagnoses, exposing the gap between the system's promise of reliability and the irreducible ambiguity of human suffering. It becomes visible when an insurance claim is denied because the diagnosis doesn't meet medical necessity criteria, and the patient who was receiving helpful treatment is cut off because the infrastructure couldn't process their particular form of distress. It becomes visible when a research finding can't be replicated because the diagnostic criteria changed between DSM editions, and ten years of accumulated data become incommensurable with the next ten years.

It becomes visible across cultures. Section 9 documented the breakdown that occurs when the DSM is applied to people whose suffering is organized by entirely different systems โ€” when shenjing shuairuo gets recoded as major depression, when the relational and spiritual dimensions of distress that African healing systems center are invisible to a framework that diagnoses individuals. The DSM's transparency โ€” the ease with which Western-trained clinicians look through it โ€” depends on a shared cultural infrastructure that makes the categories feel natural. Remove that cultural infrastructure, and the classification becomes suddenly, painfully visible as a particular culture's particular way of cutting up the space of human suffering.

It becomes visible developmentally. Section 11 showed what happens when diagnostic categories designed for adult presentations are applied to children whose brains, identities, and social worlds are still under construction โ€” the pediatric bipolar controversy, the ADHD overdiagnosis debate, the questions about what it means to give a seven-year-old a label that was validated on thirty-year-olds. Children are where the infrastructure breaks most consequentially, because children live the longest with whatever the infrastructure does to them.

These breakdowns aren't bugs. They're the moments when the infrastructure reveals itself as infrastructure โ€” as a human-made system with human limitations, serving some people well and others poorly, visible only to those it fails.


So the claim holds. Psychiatric classification exhibits every property that Star and Bowker identified as characteristic of infrastructure. It is embedded, transparent, far-reaching, learned as membership, linked to practice conventions, built on standards, inheriting its installed base, and visible mainly when it breaks. This is more than an analogy โ€” not because every social system embedded in institutions would qualify, but because the specific features that matter for understanding the DSM are precisely the features Star and Bowker identified: switching costs, installed base, path dependency, visibility upon breakdown. These are the features that explain not just what the DSM is but why it persists.

One additional concept from Bowker and Star's toolkit deserves explicit mention here, because it does essential work: the boundary object. A boundary object is a shared artifact that different communities use for different purposes while maintaining enough structural identity to coordinate action among them. The DSM is a paradigmatic boundary object. Clinicians use it for treatment decisions. Researchers use it for subject selection. Insurers use it for coverage determination. Lawyers use it for competency evaluation. Pharmaceutical companies use it for market definition. Patient advocacy groups use it for legitimacy and resource access. The manual persists not because it serves any one of these communities optimally โ€” it doesn't โ€” but because it is adequate enough for coordination across all of them. This means any replacement system would need to function as a boundary object too, and that is a design requirement the field has largely failed to name.

And descriptions have consequences.


The most important consequence is this: if the DSM is infrastructure, then the right question to ask about it is not the question that dominates most of the critical literature โ€” Is it valid? โ€” but the question that anyone who has ever tried to replace a running system knows to ask first: What is the cost of switching?

I've spent the professional portion of my career implementing and migrating classification systems. Not psychiatric ones โ€” government and enterprise systems, the kind where you're replacing the way an entire organization categorizes and processes its core data. I've watched implementations fail. I've watched migrations stall. I've watched organizations spend years and millions of dollars building a technically superior replacement system that nobody adopts, because the old system is embedded in every workflow, every training manual, every person's muscle memory. The new system is better in the abstract. The old system is installed in the concrete. And installed wins. Installed almost always wins.

The pattern is consistent enough to be almost boring: the replacement system is designed by people who understand the domain's technical problems but underestimate the domain's social infrastructure. They build something that addresses the known flaws of the existing system. It's more valid, more flexible, more theoretically grounded. And then they try to deploy it, and they discover that the existing system isn't just a technical artifact. It's a web of dependencies โ€” human dependencies, institutional dependencies, financial dependencies, identity dependencies โ€” and you can't swap out the foundation without shaking everything built on top of it.

This is what has happened, repeatedly, with attempts to reform or replace the DSM. The RDoC initiative, launched in 2010 with the full institutional authority of the NIMH, represented the most serious attempt to build a replacement from scratch. Section 6 told that story: the ambitious dimensional framework, the research domain constructs, the promise of a biologically grounded alternative. Fifteen years later, RDoC has produced valuable neuroscience research but has not displaced a single DSM category from clinical practice, because it was never designed to plug into the infrastructure that clinical practice runs on. It has no billing codes. It can't authorize treatment. It has no clinical workflow. It's the technically superior system that nobody installed.

The HiTOP framework, documented in Section 7, represents a different strategy โ€” working within the existing paradigm to reorganize its structure rather than replacing it wholesale. HiTOP's dimensional model of psychopathology has more empirical support than the DSM's categorical structure. Its spectra and subfactors capture the actual patterns in epidemiological data more accurately than discrete diagnostic categories. But HiTOP faces the same migration problem from a different angle: it needs to be translatable into the categories that the rest of the infrastructure expects. Insurance needs codes. Courts need diagnoses. Patients need names. A dimensional score on an internalizing spectrum is scientifically superior to a binary diagnosis of Major Depressive Disorder, but it can't yet do what the binary diagnosis does in the real world, which is unlock treatment.


But I want to be honest about where the metaphor strains, because pretending it's perfect would undermine the intellectual standard this essay has tried to maintain.

Infrastructure โ€” physical infrastructure, the roads-and-pipes kind โ€” is inert. It doesn't change the traffic that runs through it. A highway doesn't alter the cars that drive on it. A sewer system doesn't transform the water that flows through it. But psychiatric classification does change what it classifies. That was the argument of Section 5: Hacking's looping effects, the way diagnostic categories create new kinds of people, new illness presentations, new self-understandings that feed back into the categories and destabilize them. The DSM is infrastructure that is alive in a way that physical infrastructure is not. Its categories interact with the consciousness of the people they sort, and this interaction means the system is never stable in the way a road network is stable. The road doesn't care what you think of it. The diagnostic category cares very much โ€” or rather, it is shaped by what you think of it, whether it cares or not.

This is a genuine limitation of the metaphor, and it matters. It means that the standard infrastructure playbook โ€” maintain, upgrade, eventually replace โ€” doesn't fully apply. You can't do a straightforward system migration when the system and the things it classifies are locked in a feedback loop. Replacing the DSM wouldn't be like replacing an operating system, where the files remain the same and only the interface changes. It would be more like replacing a language โ€” the thing you're replacing is also the thing people use to understand themselves, and the act of replacement changes the selves who are doing the understanding.

The metaphor also underspecifies the ethical dimension. When a bridge is badly designed, the harm is clear and the responsibility is traceable. When a classification system sorts someone into a category that forecloses their possibilities, the harm is diffused across institutions, distributed over time, and often invisible to the people who maintain the system. Infrastructure ethics โ€” the ethics of systems that shape lives at scale while remaining invisible to their users โ€” is an underdeveloped field, and psychiatric classification is perhaps its most consequential case.

And the metaphor, for all its explanatory power, can become a kind of fatalism. If the DSM persists because infrastructure persists, if switching costs make replacement practically impossible, if the installed base constrains every possible future โ€” then what? Does the infrastructure frame simply rationalize the status quo? Does it tell us the system can't be changed and we should stop trying?

It doesn't. But it does tell us something about how change happens in systems like these โ€” and it's not the way most reformers imagine.


Infrastructure doesn't get replaced by superior alternatives. It gets replaced by alternatives that manage the migration. The interstate highway system didn't replace the railroad by being theoretically better at moving things. It replaced the railroad by building on-ramps and off-ramps that connected to existing roads, by funding construction through a gas tax that was already being collected, by riding the wave of suburbanization that was already underway. It succeeded not because it was optimal but because it figured out how to be installable โ€” how to grow alongside the existing system, absorb its traffic incrementally, and eventually become the default without ever requiring a moment of total switchover.

This is what a post-DSM classification system would need to do. Not replace the DSM in a single revolutionary act โ€” that approach has been tried and has failed, repeatedly โ€” but grow alongside it, demonstrate value in specific use cases, build connectors to the existing institutional infrastructure, and gradually shift the weight of practice from one system to the other. This is not an inspiring vision. It doesn't have the elegance of starting from scratch, the purity of building the scientifically correct system and letting truth prevail. It has the inelegance of reality, the messiness of a system migration conducted while the system is running, the frustration of maintaining backward compatibility with something you know is flawed.

But it's how infrastructure actually changes. And psychiatric classification is infrastructure.

You're about to experience this directly. Game 6 gives you a classification system, lets you build institutional structures on top of it, and then asks you to replace the underlying system without breaking everything above it. You'll discover what every systems implementer discovers: the technical problem is the easy part. The dependency web is the hard part. The switching costs are the part that defeats you.

The question isn't whether the DSM should be replaced. Almost everyone agrees it should. The question is whether anyone can afford the migration. That's what the next game is about โ€” and it's what the final section of this essay, a design brief for the system that doesn't yet exist, will try to honestly confront.

๐ŸŽฎ

Game 6: The Migration Problem

Build institutional structures on top of a classification system, then try to replace the underlying system without breaking everything above it. Discover what every systems implementer knows: the switching costs are the part that defeats you.

Loading interactive experienceโ€ฆ

Section 13 โ€” A Design Brief for the Impossible

Part V: Toward the Next Architecture

You've just tried to migrate a running classification system while institutions depended on it staying exactly where it was. Game 6 gave you the dependency web โ€” the billing codes wired to the old categories, the legal standards built on the old definitions, the research literature organized around the old populations, the patients whose identities had grown up around the old names โ€” and then asked you to replace the foundation without collapsing the building. If the game worked, you experienced something that everyone who has ever replaced production infrastructure already knows: the technical design is the easy part. The switching costs are the part that defeats you. And the people who depend on the current system will fight the migration not because they love the system but because they've organized their lives around it, and you're asking them to reorganize.

A game ends. The DSM replacement problem does not.

This section refuses to do what conclusions usually do. It will not summarize twelve sections of argument. It will not gesture toward "future research" or "the need for further interdisciplinary dialogue." It will not pretend that careful analysis has produced an answer to the question the essay has been asking. It hasn't. What it has produced is something more useful: a sufficiently honest articulation of the problem that anyone attempting to solve it would at least know what they're up against.

In my professional life โ€” designing and implementing classification systems for government agencies, migrating organizations from legacy infrastructure to new platforms โ€” I've learned that the hardest projects are never the ones with difficult requirements. They're the ones where the requirements contradict each other and nobody has been honest about it yet. You can engineer around difficulty. You cannot engineer around contradiction that no one will name. So this section takes the form of a design brief: the document that precedes a build. It specifies what the system must do. It names the constraints. It identifies the tradeoffs that cannot be optimized away. And it hands the problem to whoever comes next with the clearest possible articulation of what "solving it" would actually require.


I. The Users and Their Irreconcilable Needs

The first discipline of a design brief is to name the user classes and specify what each one actually needs from the system โ€” not what we wish they needed, not what would make the design elegant, but what they genuinely require to do their work and live their lives.

The word I want to insist on is irreconcilable. Not "in tension." Not "sometimes competing." Irreconcilable. These user classes need fundamentally different things from the same system, and no amount of clever architecture eliminates the conflict. Every failed attempt at DSM reform has failed in part by pretending otherwise.

Start with clinicians, because they're the users the system was ostensibly built for. Section 3 showed what they actually do with the DSM โ€” the pattern recognition that precedes the checklist, the dual consciousness of practitioners who depend on a system they don't entirely believe in, the institutional compression that turns diagnosis from a clinical exploration into a bureaucratic processing step. What clinicians need is speed, memorability, and clinical utility. Categories that map onto treatment decisions โ€” not perfectly, but well enough to guide action at 4:30 on a Friday with a full waiting room. A shared professional vocabulary for the referral letter, the case conference, the insurance form. Enough granularity to be meaningful and enough simplicity to be usable. Categories stable enough to build expertise around but flexible enough to accommodate the person in front of them who doesn't fit. What they do not need, and what the DSM has increasingly given them, is a system optimized for other users' purposes that they're required to use as though it were designed for theirs.

Researchers need something different. They need validity โ€” categories that carve nature at something approximating its joints, or at least that group together people whose underlying processes are meaningfully similar. They need categories that predict treatment response, illness course, biological correlates. They need precision and reproducibility. They need categories stable enough to accumulate a literature around โ€” you cannot run a ten-year longitudinal study if the definition of your target condition changes at year three โ€” but revisable when the evidence demands it. And they need the system to be wrong in useful ways, which is a point worth dwelling on: a flawed category that generates productive research questions is better, for research purposes, than no category at all. The philosopher of science would recognize this as an instrumentalist criterion. The value of the category lies not in its truth but in its fertility.

Patients need recognition โ€” the experience of being seen and understood, not merely sorted. Section 3 traced what the diagnostic encounter feels like from their side: the relief of naming, the foreclosure of labeling, the torque of living inside a category that was built for institutional purposes rather than experiential accuracy. What patients need from the system is categories that open doors โ€” to treatment, accommodations, benefits, community, self-understanding โ€” without closing others. They need to be able to use the classification without being used by it. And here is what makes the design problem genuinely painful: some patients need a name for their suffering, a word that makes the chaos legible, that says this is a real thing and other people have it too. Other patients need to be freed from a name that was imposed on them, a label that has foreclosed possibilities and narrowed identity to a diagnosis. These are opposite needs, held by people within the same diagnostic category. No system satisfies both.

Insurers and administrators need binary decisions. Covered or not. Eligible or not. Documented or not. Their systems process millions of claims and cannot accommodate nuance at scale. They need reliability above all else โ€” not from indifference to validity, but because reliability is what makes high-volume administrative processing possible. Legal systems need something similar but starker: bright lines. Competent or incompetent. Responsible or diminished. Disabled or not. Definitions stable enough to build case law around, precise enough to survive cross-examination. What legal systems need, in practice, is the appearance of scientific certainty โ€” which is precisely what honest classification, after everything this essay has documented, cannot provide.

Policymakers and public health systems need epidemiological trackability. Prevalence rates that hold steady across time so you can determine whether depression is increasing. Categories stable across geography so you can compare rates between countries and allocate resources. They need the classification to function as a surveillance instrument, which requires exactly the kind of reification that Section 4 warned against โ€” treating categories as fixed objects that can be counted, rather than as fluid constructs whose boundaries shift with every committee revision.

Here is the design constraint these users collectively produce, stated as plainly as I can manage: no single classification system can optimally serve all of them. The system clinicians need is simpler than the system researchers need. The system insurers need is more rigid than the system patients need. The system legal proceedings require is more certain than honest science can deliver. The system epidemiology demands is more stable than scientific progress would recommend.

Every existing psychiatric classification โ€” the DSM, the ICD, the now-absorbed CCMD โ€” represents a compromise among these users. But the compromise has never been explicitly negotiated or transparently documented. The DSM does not say: "We optimized for reliability over validity because insurers and administrators needed it." It does not say: "We chose categorical boundaries over dimensional descriptions because legal and administrative systems require binary outputs." It presents its design compromises as scientific conclusions, which obscures the choices and makes them impossible to evaluate or revise. An honest design brief would do the opposite: it would specify whose needs take priority under what conditions, or it would propose an architecture that serves different users differently from the same underlying framework.


II. The Requirements

Every section of this essay has been generating requirements for an adequate classification system, whether or not it announced itself that way. Here they are, translated into design language. Ten requirements, each one earned by a different piece of the argument.

The system must demonstrate clinical utility โ€” must improve outcomes relative to not classifying at all. Section 10 showed this bar is higher than it sounds. For conditions where diagnosis drives treatment selection โ€” bipolar disorder, psychotic disorders โ€” the classification earns its keep. For the most prevalent conditions, the Dodo bird verdict holds: common therapeutic factors predict outcome better than diagnostic specificity. The requirement, honestly stated, is not that every category must guide treatment but that the system must be transparent about which categories do and which don't.

The system must achieve adequate reliability. Not maximal โ€” adequate. The DSM-III revolution established inter-rater agreement as the foundational requirement, and Game 3 showed what happens when you optimize for it: validity collapses. The question is not "how reliable can we make the categories?" but "how reliable do they need to be for each user class?" โ€” which is a harder question because it requires specifying "enough" rather than "more."

The system must pursue validity, which means specifying validity with respect to what. Section 4 showed that "corresponding to something real" fragments into different targets that don't converge. A category might correspond to a biological mechanism, a stable pattern of suffering, a meaningful predictor of course or treatment response. Major depressive disorder has moderate predictive validity, weak biological validity, and debatable construct validity โ€” it may group together several distinct conditions, as the 1,030 unique symptom profiles in the STAR*D dataset suggest. The system must be honest about which kind of validity each category has and which it lacks.

The system must accommodate dimensionality. The taxometric evidence from Section 7 is clear: most psychopathology is continuous, not discrete. Hard categorical boundaries are scientifically indefensible for most conditions. But many institutional uses require binary decisions. The design question is whether a dimensional foundation can support categorical outputs โ€” the way blood pressure is continuous but "hypertension Stage 1" is a categorical output with an explicit, convention-based threshold โ€” without the categories being mistaken for the reality underneath.

The system must achieve cultural validity. Section 9 showed that the same suffering is constituted differently across cultural contexts, that Western psychiatric categories carry the ontological commitments of the culture that produced them, and that their global export is not a neutral scientific act. The starkest version of the design question: is a single global system possible, or does adequacy require multiple culturally grounded systems with translation protocols between them?

The system must be developmentally sensitive. Section 11 documented what happens when it isn't โ€” the pediatric bipolar catastrophe, the birthday effect in ADHD diagnosis, the fundamental mismatch between categorical snapshots and developmental trajectories. Classification means something different when applied to a seven-year-old whose brain will develop for another fifteen years than when applied to a thirty-seven-year-old with established patterns of suffering.

The system must resist reification โ€” must include structural mechanisms that prevent its categories from being mistaken for natural kinds when they are practical kinds. This is Zachar's concern from Section 4, Hyman's diagnosis from Section 6, and the central warning of the infrastructure analysis: categories that become embedded in institutional life get treated as features of nature rather than tools built by human hands. The question is whether any structural feature can prevent this when the system functions as infrastructure โ€” or whether reification is an emergent property of embeddedness itself.

The system must resist capture by commercial, political, and institutional interests. Section 8 documented the mechanisms: pharmaceutical companies shaping categories to match drug mechanisms, insurance requirements driving diagnostic inflation, the structural conflicts of interest on DSM panels. The design question is about governance: what structures could insulate classification decisions from the interests they regulate?

The system must be legible to the people it classifies and must preserve their agency in the classification process. Currently, no psychiatric classification includes a patient-facing layer. The DSM is written for clinicians. The ICD is written for coders. Nobody wrote a version for the person sitting across the desk, wondering whether the name they've been given is a key or a cage. This absence is not an oversight. It's a design failure that reflects the power asymmetry Section 3 described from both sides of the diagnostic encounter.

And the system must be designed for evolution. Not as a static taxonomy overhauled every two decades โ€” each overhaul a system migration with enormous switching costs, as Section 2 traced from DSM-III through DSM-5 โ€” but as a living system with built-in mechanisms for incorporating new evidence, retiring obsolete categories, and managing the cost of change from the start.

Ten requirements. Each one reasonable on its own terms. Taken collectively, they describe a system that may not be buildable.


III. The Genuine Contradictions

Some of these requirements are in tension. Others are in outright contradiction. The distinction matters โ€” because tensions can be engineered around, while contradictions require honest tradeoffs. A tension is a hard problem. A contradiction is a choice you have to make and then own.

The reliability-validity tradeoff is a tension, not a contradiction. Under the DSM's categorical structure, these demands pull apart โ€” Game 3 made that visceral. But they may not be permanently opposed. Under a dimensional system with explicitly defined categorical outputs, reliability and validity can potentially coexist: you measure the dimensions reliably and derive the categories from them through documented thresholds. The HiTOP consortium is an active attempt at this reconciliation. It hasn't succeeded yet, but the architecture is coherent. Similarly, the gap between clinical utility and scientific validity is a tension that honest labeling could manage. A classification can be scientifically imprecise and clinically useful โ€” the "depression" umbrella groups heterogeneous conditions but still initiates treatment that helps many people. The danger lies only in presenting clinically pragmatic categories as though they were scientific findings, which is what the DSM does. And the dimensional-versus-categorical divide is, at bottom, an engineering problem: other fields solve it routinely. Blood pressure is continuous; hypertension Stage 1 is a categorical output; everyone involved understands the threshold is a convention. Psychiatry could do exactly this if it were willing to say so out loud.

But there are genuine contradictions โ€” places where maximizing one requirement means degrading another, where the tradeoff cannot be engineered away but only navigated.

Cultural validity contradicts global comparability. A system sensitive enough to respect local meaning-making โ€” to accommodate shenjing shuairuo and susto and hikikomori on their own terms, as Section 9 argued they deserve โ€” will not produce comparable data across cultures. A system standardized enough for global epidemiology will flatten local meaning. These are not two points on a single axis that can be balanced. They are two goods that cannot be simultaneously maximized. The brief must say so, and then ask: which use cases require comparability, which require cultural fidelity, and can the system's architecture serve both through different interfaces to the same framework? This is the question Section 9 raised with the Unicode analogy โ€” a shared encoding standard that accommodates radical diversity rather than eliminating it. Whether it's achievable for human suffering as well as for writing systems is genuinely unknown.

Resistance to reification contradicts institutional usability. The more a system embeds caveats, dimensionality, uncertainty, and epistemological honesty, the less usable it becomes for insurance companies, courts, and administrative processes that require clean categories and confident boundaries. Making the classification honest makes it institutionally dysfunctional. Making it institutionally functional requires a kind of strategic dishonesty about its own certainty โ€” the very dishonesty that Section 4 identified as the root of reification. This is not a problem to be solved. It is a condition to be managed. And managing it means being explicit, within the system's own architecture, about where it is being strategically confident despite genuine uncertainty, and why.

Patient agency contradicts clinical authority. If you take seriously the argument that the person being classified has expertise about their own experience โ€” and Section 3 made that case from the patient's side โ€” then genuine shared decision-making means accepting that the patient may disagree with their diagnosis, may have insights the clinician lacks, may experience the same symptoms through a framework the manual doesn't recognize. But clinical training and pattern recognition also detect things patients cannot self-report. The manic patient who feels better than ever. The person with anosognosia who cannot recognize their own cognitive decline. There is no clean hierarchy of epistemic authority here, and the system must accommodate both forms of knowledge without collapsing into either "the doctor is always right" or "self-diagnosis is sufficient."

Stability contradicts evolvability. Researchers need categories stable enough for ten-year longitudinal studies. Clinicians need them stable enough to build a career's worth of expertise. But science moves, and locking in categories for institutional convenience produces the calcified infrastructure this entire essay has been analyzing โ€” categories that accumulate decades of technical debt while the evidence underneath them shifts. Every version of this system will face the question: when is the cost of maintaining a category the evidence no longer supports greater than the disruption of changing it? There is no formula. It is a judgment call, and the system's governance must be designed to make it well rather than to avoid it.

These contradictions are not failures of the current system that a better system would resolve. They are structural features of the problem. Any adequate classification must navigate them โ€” not by pretending they don't exist, but by being transparent about which tradeoffs it has chosen and why.


IV. Architectural Patterns Worth Considering

I want to be careful here. After twelve sections demonstrating why this problem is so hard, proposing solutions risks a naivety the essay hasn't earned. What I can offer instead โ€” what a design brief should offer โ€” is structural approaches that have worked in analogous domains, presented as directions worth exploring rather than answers.

The most promising direction is a layered architecture: a dimensional foundation at the base โ€” informed by RDoC's ambitions, chastened by the limitations Section 6 documented โ€” with multiple interface layers serving different users. Clinicians would interact with clinical profiles optimized for the treatment decisions they actually make. Researchers would access dimensional scores at whatever granularity their studies require. Insurers and administrators would receive categorical codes โ€” binary outputs derived from the dimensional foundation through explicit, documented thresholds that the system identifies as conventions rather than discoveries. Patients would encounter a dedicated layer written for them: accessible language explaining what the classification means and doesn't mean, what it predicts and doesn't predict, what doors it opens and what rights it carries.

The precedent is how operating systems work. The same hardware and kernel present radically different interfaces to the casual user, the application developer, and the systems engineer. None of the interfaces is the "real" one. Each is a view onto the same underlying system, optimized for a different use case. This wouldn't eliminate the contradictions. But it would make them manageable by allowing different user classes to interact with the classification at the level of abstraction appropriate to their needs, rather than forcing everyone to use a single interface that badly serves all of them.

A second direction is modular versioning. Rather than the monolithic revisions that have defined the DSM's history โ€” DSM-III in 1980, DSM-IV in 1994, DSM-5 in 2013, each one a massive system migration with enormous switching costs โ€” individual categories could be versioned independently as evidence accumulates. Major Depressive Disorder v3.2. Autism Spectrum Disorder v2.1. Each version would carry a changelog documenting what changed, the evidence that motivated the revision, and a migration guide for users of the previous version. This is semantic versioning applied to nosology, and the precedent is how modern software ecosystems manage change: small, frequent, documented updates rather than infrequent overhauls. In biological taxonomy, something similar already operates โ€” individual species are reclassified continuously without requiring the Linnaean system to be rebuilt from scratch.

A third direction: explicit confidence ratings built into the system's architecture. Each category would carry metadata indicating what kind of validity it has (predictive, biological, phenomenological), how much consensus exists, and how well-validated the category is relative to others. Some categories would be flagged as "clinically useful, biologically unvalidated" โ€” an honest description of most mood and anxiety disorders. Others might carry "strong biological validity, limited clinical utility." The analogy is to financial disclosures: the numbers exist alongside explicit statements about confidence levels, methodology, and limitations. Nobody reads a financial statement and mistakes an estimate for an audited figure, because the document is designed to prevent exactly that confusion. The DSM is designed in a way that actively encourages the equivalent confusion โ€” presenting all categories with the same epistemic authority regardless of how well-validated they are.

And a fourth: pluralistic cultural modules rather than a single global taxonomy. A core framework defining dimensional constructs at a level of abstraction high enough to accommodate cultural variation โ€” "disturbances in mood regulation" rather than "major depressive disorder" โ€” with culturally specific modules that specify how those constructs manifest locally, what local categories map onto them, and where the mapping breaks down. This accepts that complete universality is impossible while maintaining enough common structure for cross-cultural research. The Unicode analogy from Section 9 bears repeating one more time: not one writing system for the world, but a shared encoding standard that accommodates thousands of them.

None of these patterns is new. Variations of each have been proposed in the literature, debated at conferences, and piloted in research contexts.

There is a fifth direction: phenomenological grounding โ€” the argument, developed most rigorously by Ratcliffe, Parnas, and Sass, that classification should begin not from symptom checklists or biological markers but from careful descriptions of what it is actually like to experience different forms of suffering. The DSM skipped this descriptive step, moving straight from clinical observation to operational criteria. A classification system that began from phenomenological foundations might carve the landscape of suffering more accurately, even if the resulting categories were harder to operationalize.

The question has never been whether better architectures are imaginable. The question has always been whether they are installable โ€” whether they can be plugged into the institutional infrastructure that clinical practice, research, insurance, law, and patient identity actually run on. Which brings us to the problem the entire essay has been building toward.


V. The Migration Problem, Revisited

Game 6 gave you the experience. Now I want to name the problem in full, drawing on what I know professionally about replacing running systems.

Suppose a classification meeting every requirement in this brief were designed tomorrow โ€” layered, modular, versioned, culturally pluralistic, dimensionally grounded, epistemically honest, with a patient-facing layer and explicit confidence ratings. A superior system by every criterion this essay has identified. Implementing it would require simultaneous overhaul of insurance billing codes across every payer in every country that uses DSM or ICD categories. Retraining of hundreds of thousands of clinicians worldwide. Revision of legal standards for competency, commitment, disability, and criminal responsibility in every jurisdiction. Rebuilding of electronic health record systems whose data models are organized around current categories. Reanalysis or abandonment of decades of research built on DSM-defined populations. Renegotiation of drug approval frameworks with every regulatory agency. Revision of educational accommodation systems. And โ€” this is the one the institutional analysis always misses โ€” managing the disruption to millions of people whose self-understanding, treatment plans, insurance coverage, legal standing, and community belonging are organized around the names they currently carry.

I list these not to induce despair but to induce honesty. In my career implementing classification systems for government agencies, I've watched organizations attempt what the field calls a forklift replacement โ€” ripping out the old system and dropping in the new one in a single operation. It almost never works. The dependencies run too deep, the users are too numerous, the edge cases are too unpredictable, and the cost of failure โ€” when the system processes disability claims, or authorizes treatment, or determines legal standing โ€” falls on the people least able to absorb it.

What works is incremental migration. You build the new system alongside the old one. You run them in parallel. You migrate users and institutions one at a time, starting with the ones most willing and least disrupted by the change. You maintain backward compatibility โ€” the new system speaks the old system's language when it needs to. You build translation layers between the old categories and the new, so that nothing breaks during transition even if it works differently under the hood. You accept that the migration will take years, probably decades, and you design the new system with that timeline in mind from the beginning.

The honest assessment: the switching costs for psychiatric classification may exceed the benefits of any plausible improvement. Which means the most realistic path forward is probably not replacement but evolution โ€” gradual, modular, well-documented changes to the existing infrastructure, each one designed to move the system incrementally toward the requirements this brief has outlined without requiring a wholesale overhaul that would break everything built on top of the current system. This is less satisfying than a revolution. It is also more honest about how infrastructure actually changes in the world. Section 12 made the case, and Game 6 made you feel it: the DSM will not be replaced by a superior system unveiled at a conference. If it changes at all, it will change the way cities replace their water pipes โ€” one section at a time, over decades, while the water keeps flowing.

But here the migration metaphor encounters the looping problem the essay identified in Section 5. You can swap a database without changing the data. You cannot swap a psychiatric classification without changing the people it classifies, because their self-understanding is organized around the existing categories. The metaphor illuminates the institutional problem โ€” the switching costs, the dependency web, the installed base โ€” while underspecifying the human one. The looping effects don't disappear because the model doesn't account for them, and any actual migration plan would need to reckon with the fact that the people inside the system are not inert data being ported from one format to another.


VI. What This Brief Is Actually Asking For

Let me reframe what you've just read, because the design brief is easy to misunderstand.

It is not asking for a better taxonomy. It is asking for a better relationship between classification and the humans it touches โ€” the people who classify and the people who are classified.

The fundamental problem this essay has identified is not that the DSM's categories are wrong, though some of them are. It is not that the science is insufficient, though in many areas it is. The fundamental problem is architectural. The system treats classification as a finished product rather than a living process. It treats categories as descriptions of nature rather than tools shaped by human decisions for human purposes. It treats the people it classifies as objects of sorting rather than participants in meaning-making. And it presents its design compromises โ€” between reliability and validity, between clinical utility and scientific precision, between institutional functionality and epistemological honesty โ€” as though they were scientific conclusions rather than choices that could have been made differently and that could still be revised.

An adequate system would not merely have better categories. It would have a better relationship to its own limitations. It would be transparent about what it knows and what it doesn't. It would mark where its boundaries are conventions and where they reflect something found rather than decided. It would name who benefits from each categorical choice and who is harmed by it. It would be designed not only for the clinician's workflow or the researcher's methodology or the insurer's efficiency but also for the person sitting across the desk, trying to understand their own suffering and wondering whether the name they've been given is a key or a cage.

Nobody has built this system. The constraints identified in this brief may be genuinely irreconcilable โ€” not in theory, where everything is possible, but in practice, where infrastructure lives. The contradictions may not admit of resolution, only of navigation: honest tradeoffs chosen transparently rather than obscured by the language of scientific objectivity.

But the first step toward building anything adequate is knowing clearly what you're trying to build and what's stopping you. That is what this brief is for. That is what this essay has been for.


The six games that have run through this essay were not illustrations of the argument. They were the argument, in a different medium. You drew boundaries on continuous data and watched the populations change. You classified entities that changed in response to being classified, and felt your own categories dissolve beneath you. You tuned a diagnostic system for reliability and watched validity collapse, or tuned for validity and watched agreement disappear. You encountered the same human suffering organized through another culture's categories and felt your own framework become visible as a framework rather than a transparent window onto reality. You were sorted โ€” and felt the label reshape what was possible for you, what stories you could tell, what futures you could see. And you tried to migrate a system while the world depended on it, and discovered why the DSM persists even though almost everyone who thinks carefully about it agrees it should be different.

You now know something about psychiatric classification that no amount of reading alone could have given you. Not facts โ€” you could have gotten those from a textbook. What you have is the felt understanding of why the problem is shaped the way it is: the embodied knowledge of constraints that can only be understood by inhabiting them. The kind of knowledge that changes not what you think but how you think about it.


Coda

This piece exists because I believe the hardest problems are not the ones that need more data or more funding or more smart people in a room. They're the ones that need more honest articulation. The way you frame a problem shapes what solutions become thinkable. Frame psychiatric classification as a scientific problem, and you reach for better science. Frame it as a political problem, and you reach for better governance. Frame it as an infrastructure problem โ€” which is what I've tried to do here, drawing on the one domain where I can claim direct professional knowledge โ€” and you see something different: not a breakthrough waiting to happen, but a sustained, multi-decade, multi-stakeholder effort to redesign a system while it's running, informed by the expertise of everyone it serves and everyone it sorts.

I don't have the answer. What I have is this articulation โ€” and the conviction that getting the question right matters at least as much as getting the answer right, because we've been building answers to the wrong question for a long time.

The question is yours now. What you build with it is up to you.

About This Project

How This Was Made

This essay began with a question that felt too simple: What does psychiatric classification actually do? Not whether the DSM is scientifically valid โ€” that debate has been running for decades. Not whether mental illness is "real" โ€” a question so poorly framed that attempting to answer it mostly generates heat. Instead: what happens when a committee's decisions about where to draw diagnostic lines become the infrastructure that an entire society uses to organize suffering?

The Research

The literature review spans approximately 250 works across ten fields: the history of psychiatry, philosophy of science, biological psychiatry, psychometrics, medical sociology, cross-cultural psychiatry, phenomenology and lived experience, treatment outcomes, developmental psychology, and information science. These bibliographies were built systematically, discipline by discipline, then cross-referenced to identify works that appeared across multiple domains. The value proposition was never disciplinary depth โ€” any specialist knows their own field better than I do. It was disciplinary translation: the synthesis that becomes possible when you read across fields that have been conducting parallel conversations without hearing each other.

The Architecture

The essay's structure โ€” 13 analytical sections in a five-part arc โ€” was designed before any prose was written. Part I establishes the infrastructure framework. Part II asks what diagnoses are. Part III examines the system as social instrument, then removes the Western frame entirely. Part IV evaluates whether it works and closes with a design brief for what a better system would require.

The six interactive experiences are positioned at argumentative pivot points โ€” not as illustrations of the preceding analysis, but as arguments in their own right. Each asks you to do something the prose can only describe. The form enacts the thesis: the games make invisible architecture visible.

The Stance

In my professional life, I design and implement classification systems for government agencies. I've watched committees negotiate category boundaries and seen provisional decisions harden into production infrastructure that thousands of downstream processes depend on. The infrastructure metaphor isn't borrowed from information science for theoretical polish โ€” it comes from watching classification happen at institutional scale, and recognizing the same dynamics in psychiatric nosology.

This piece is not a critique of psychiatry from outside. It's an infrastructure analysis from someone who builds this kind of thing, aimed at making visible the architectural decisions that usually remain invisible.

Technical Note

This essay is a single HTML document. The interactive games are self-contained components loaded lazily as you scroll. No external dependencies. No tracking. No analytics. Position persistence and dark mode preferences are stored locally in your browser and never leave your device. The reading experience was designed to disappear โ€” infrastructure that does its job when you forget it's there.

The Invisible Architecture was developed between late 2025 and early 2026.