Every system that acts under uncertainty depends on a classification layer — a set of categories that determines what the system can represent and therefore what it can do. We argue that classification layers function as infrastructure in the technical sense developed by Bowker and Star (1999): embedded in other structures, transparent when working, constitutive rather than descriptive, and resistant to change once installed. It follows that the most consequential design decisions in such systems are taxonomic, not algorithmic. We develop this thesis through three case studies spanning radically different domains: psychiatric nosology (the DSM), LLM inference execution planning (LLM-QP), and LLM agent cognition (SAGEN). Across all three cases, the same structural pattern holds: the taxonomy defines the space of possibility, optimization searches only within that space, and the taxonomy becomes invisible precisely when it is most consequential.
Introduction — The Taxonomic Blind Spot
Across AI, medicine, and systems design, enormous effort goes into optimizing algorithms, architectures, training data, and governance. The categories those systems operate over receive comparatively little deliberate attention — yet the classification layer is where the behavioral ceiling gets set.
Central claim. In any system where an agent must act under uncertainty, the classification layer functions as infrastructure. It is embedded, transparent, constitutive, and resistant to change. The most consequential design decisions are taxonomic, not algorithmic.
The paper previews three case studies as questions each case answers:
- Can a classification system become so embedded that it persists for decades despite known scientific inadequacy? Psychiatric classification shows that it can.
- Does taxonomic commitment constrain optimization even in purely computational systems with no human institutions? LLM-QP’s plan lattice shows that it does.
- Can a system be deliberately designed to resist the infrastructural hardening that makes classification invisible? SAGEN’s adapter pattern attempts this.
Scope. The paper argues for taxonomic awareness — not taxonomic nihilism (all categories are arbitrary) or taxonomic perfectionism (one right taxonomy exists). It treats the classification layer as a first-class design decision with structural consequences that persist long after the decision is made.
Classification as Infrastructure
Bowker and Star’s Sorting Things Out (1999) identified eight properties that characterize infrastructure. These are not a checklist; they form a mechanism.
| # | Property | Description |
|---|---|---|
| 1 | Embeddedness | Sunk into other structures, social arrangements, and technologies |
| 2 | Transparency | Invisible in routine use — users look through the system |
| 3 | Reach or scope | Extends beyond a single event or one-site practice |
| 4 | Learned as membership | Acquired as part of socialization into a community of practice |
| 5 | Links with practice | Shapes and is shaped by the practices it organizes |
| 6 | Embodiment of standards | Plugs into other standards and is determined by their reach |
| 7 | Built on installed base | Inherits strengths and limitations of predecessors (path dependency) |
| 8 | Visible upon breakdown | Disappears when working; surfaces only when something fails |
Why these properties interact. Properties 1–2 (embeddedness + transparency) explain invisibility: because the taxonomy is sunk into everything and invisible in use, it ceases to feel like a design decision. Properties 7–8 (installed base + breakdown visibility) explain lock-in: because the taxonomy inherits from its predecessors and is only noticed when it fails, revision is both constrained by path dependency and triggered only by crisis. The combination — invisibility plus lock-in — is the mechanism that produces the infrastructure paradox.
Key Concepts from Bowker and Star
- Boundary object. A shared artifact used differently by different communities while maintaining enough structural identity to coordinate across them.
- Torque. The biographical tension experienced by people whose lives don’t fit the available categories.
- Infrastructural inversion. The methodological move of foregrounding what is normally in the background.
Taxonomic Commitment
Definition. A taxonomic commitment is the set of categories a system operates with — the distinctions it can draw, the groupings it can represent, the boundaries it enforces. Every system that classifies makes a taxonomic commitment, whether or not it acknowledges doing so.
The key property. A taxonomic commitment defines the space of representable states, and therefore the space of achievable behaviors. An optimizer can only search within the space its taxonomy defines. A clinician can only diagnose conditions their manual contains. An agent can only reason about distinctions its architecture represents. Only taxonomic revision can expand this space.
Philosophical Grounding
- Dupré’s promiscuous realism. Multiple equally legitimate ways to classify the same reality exist, each optimized for different purposes.
- Zachar’s practical kinds. Categories are tools, not mirrors. Validity is measured by how well they serve their intended purposes.
- Hacking’s interactive kinds. In domains involving human subjects, the categories and the classified co-constitute each other. The taxonomy is not just a lens on reality but an intervention in it.
Three Predictions
- Constitutiveness. The taxonomy defines the representable space, not merely describes it.
- Lock-in. The taxonomy becomes harder to revise the more successfully it is installed.
- Invisibility through transparency. The taxonomy becomes harder to see the more fluently practitioners use it.
Psychiatric Classification — The DSM
The most fully developed case. All eight infrastructure properties instantiated in an existing system with decades of history and global reach. Demonstrates all three predictions at maximum intensity — including looping effects unavailable in the computational cases.
The System
The DSM’s categories determine what insurance covers, how research is designed, which drugs are developed, how courts evaluate competency, how schools provide accommodations, how patients understand their own suffering. It maps against all eight infrastructure properties: embedded in insurance billing, legal standards, pharmaceutical regulation, EHRs, educational accommodation systems, military benefits. Transparent to trained clinicians who have internalized its categories as perceptual habits.
The Taxonomic Commitment
The DSM’s core commitments were design decisions, not discoveries:
- Categorical rather than dimensional
- Symptom-based rather than etiological
- Atheoretical with respect to causation
- Individual rather than relational as the unit of analysis
Spitzer’s DSM-III optimized reliability (inter-rater agreement) over validity, institutional utility over phenomenological accuracy. Once installed, these choices became invisible — the water clinicians swim in. Consequences include categories that group together people with wildly different experiences (1,030 unique symptom profiles under one depression diagnosis) and arbitrary thresholds (five of nine symptoms for two weeks).
The Switching Cost Problem
Multiple technically superior alternatives exist: RDoC (dimensional, biologically grounded), HiTOP (hierarchical dimensional), network theory (no latent variable assumption), process-based therapy (transdiagnostic mechanisms). None has displaced the DSM — not because it is better, but because it is installed.
Replacement requires simultaneous overhaul of insurance billing systems, retraining of hundreds of thousands of clinicians, revision of legal standards across all jurisdictions, rebuilding of EHRs, reanalysis of decades of research, and managing disruption to millions of people whose identity is organized around current categories. The installed base wins.
What This Case Uniquely Reveals
Looping. DSM categories do not merely describe mental disorders; they partially constitute them. The classified become aware of the classification, change their behavior, and thereby change the phenomenon. This is Hacking’s interactive kinds operating at institutional scale.
Boundary object persistence. The DSM endures not because any single community finds it optimal but because it is adequate enough for coordination across all communities — clinicians, insurers, researchers, lawyers, pharmaceutical companies, patients — while being optimal for none.
Inference Execution Planning — LLM-QP
A compressed, computational instance of the same structural pattern. No looping, lower switching costs, but identical constitutive property. Demonstrates that taxonomic commitment is structural, not sociological.
The System
Constrained LLM decoding requires checking every token against a validity set. LLM-QP formalizes the observation that multiple execution strategies are semantically equivalent — they produce identical token sequences — but differ in runtime cost. Five physical plans implement the single logical operation DecodeStep(query, constraint_state): dense projection head, sparse adjacency scoring, amortized score update, amortized update with rerank, and full recomputation.
The Taxonomic Commitment
The plan lattice — the set of five physical implementations the planner considers — is LLM-QP’s taxonomic commitment. A planner whose lattice lacks an amortized operator cannot discover amortized savings, regardless of cost model sophistication or bandit algorithm. This mirrors database query optimizers: the physical operator enumeration defines the selection space; the cost model selects among lattice-provided options.
Infrastructure Properties
- Embeddedness. The lattice is embedded in the MLIR/StableHLO compiler pipeline.
- Transparency. When working well, invisible — the system just runs fast.
- Built on installed base. Strategies constrained by existing kernel implementations and hardware capabilities.
- Links with practice. The bandit achieves sub-linear regret relative to oracle plan selection — but convergence to the best plan within the lattice, not the best plan conceivable.
What This Case Uniquely Reveals
LLM-QP isolates the constitutive property in a domain stripped of human institutions, identity, and looping. The plan lattice is a deliberate engineering artifact, yet it exhibits the same constitutive property as the DSM. The case also contributes an anti-reification device: the formal plan equivalence proof, which makes the lattice’s contingency visible — the plans are choices among equivalent alternatives, not the unique correct implementation.
Agent Cognitive Architecture — SAGEN
Bridges the gap between the computational and institutional cases. Constitutive like LLM-QP; shapes perception and action like the DSM; but includes a deliberate anti-reification mechanism.
The System
SAGEN provides LLM agents with structured, persistent situational awareness through six cognitive modules on a shared blackboard: Goal Graph, Trajectory, World Model, Self Model, Attention Priorities, and Interaction Protocol. Coordination occurs through an Observe–Update–Inject loop.
The Taxonomic Commitment
Module-level. The six-module decomposition determines what the agent can represent and act on. An agent lacking a Trajectory module cannot distinguish a topic pivot from a topic abandonment. An agent without typed Attention cannot allocate urgency differentially.
Finer-grain. The Trajectory’s seven transition types (progress, reversal, pivot, discovery, external event, failure, branch) define recognizable episodic patterns. The Attention module’s four categories (threat, opportunity, anomaly, transition) determine representable salience. Domain adapter scan patterns are the agent’s perceptual categories.
The Adapter as Anti-Reification Mechanism
The domain adapter pattern is a deliberate architectural response to the reification problem — requiring a developer to explicitly choose entity types, relationship types, and scan patterns for each domain. This keeps categories visible as design decisions rather than allowing them to calcify into invisible assumptions.
The adapter pattern does not solve the reification problem. The six modules themselves are not adapter-replaceable — they are the installed base. The adapter resists reification at the domain-content level while the module-level taxonomy remains vulnerable. This illustrates that anti-reification is a gradient, not a binary.
What This Case Uniquely Reveals
SAGEN demonstrates that taxonomic commitment determines the behavioral ceiling of a cognitive architecture. Of 20 information dimensions evaluated, 16 were captured by none of the flat-memory baselines. These require explicit architectural support — no improvement in the underlying LLM produces them.
Uniquely, SAGEN demonstrates layered taxonomic commitment — the six modules are a deep commitment (hard to revise), while adapter content is a shallow commitment (designed to be revised per domain). The DSM conflates both layers into one monolithic artifact, which is part of why revision is so costly.
Cross-Case Analysis — The Structural Pattern
| Dimension | DSM | LLM-QP Plan Lattice | SAGEN Modules |
|---|---|---|---|
| Defines representable space | Diagnosable conditions | Selectable execution strategies | Performable cognitive operations |
| Constrains optimization | Treatment limited to recognized categories | Cost selection limited to enumerated plans | Agent reasoning limited to represented distinctions |
| Transparent when working | Clinicians see through categories | Planner selects transparently | Agent reasons through modules |
| Resists revision | Institutional switching costs | Compiler/kernel dependencies | Architectural assumptions |
| Anti-reification mechanism | None | Formal plan equivalence proofs | Domain adapter pattern |
| Unit of revision | Entire manual (15–20 yr cycle) | Individual plan (add new kernel + pass) | Adapter (shallow) or module (deep) |
Key Differences
Looping. DSM categories loop — they change the people they classify. LLM-QP’s lattice does not loop. SAGEN occupies a middle position — its categories shape agent perception but the agent doesn’t reflexively modify its own categories.
Switching costs. DSM: societal (legal, financial, institutional, identity). LLM-QP: technical (compiler passes, kernels). SAGEN: architectural (module redesign, adapter revision). The mechanism differs; the structural effect is the same.
The Hardness Spectrum
| Level | System | Looping | Switching Costs | Reification Risk |
|---|---|---|---|---|
| Soft | LLM-QP | None | Technical and bounded | Low |
| Medium | SAGEN | None (Hacking sense) | Architectural | Moderate |
| Hard | DSM | Yes — institutional scale | Societal | Maximal (largely realized) |
The spectrum is not a ranking of quality. Hard infrastructure is not worse than soft; it is harder to change. The design implication: systems should be built as soft as the domain permits.
Testing the Three Predictions
- Constitutiveness. Confirmed in all three cases.
- Lock-in. Confirmed with varying intensity — maximal for DSM, minimal for LLM-QP.
- Invisibility through transparency. Confirmed with a gradient — SAGEN’s adapter pattern deliberately forces visibility.
The Infrastructure Paradox
The paradox. Classification works best when invisible but becomes most dangerous when invisible. The four-phase cycle:
- Design. Categories chosen as practical tools — provisional, purpose-specific, explicitly acknowledged as decisions.
- Installation. Categories embedded in practice. Learned, linked to conventions, connected to standards.
- Transparency. Categories become invisible. Users look through them. They cease to feel like choices.
- Reification. Categories treated as discoveries rather than decisions. Provisional conventions hardened into natural kinds.
The cycle operates at different speeds across the hardness spectrum. Computational taxonomies (LLM-QP) cycle fast and reify weakly. Cognitive architectures (SAGEN) cycle at medium speed. Institutional taxonomies (DSM) cycle slowly and reify completely.
The structural insight. The cycle is not a failure of vigilance. It is a consequence of infrastructure’s defining property: to function, it must be transparent; to be transparent, it must become invisible; to become invisible, it must cease to feel like a choice. The only defense is architectural — mechanisms that structurally resist transparency’s slide into reification.
Implications for System Design
Taxonomic Commitment as First-Class Design Decision
The categories a system operates with should be documented, evaluated, and revisited with the same rigor applied to algorithmic choices: explicit enumeration of what the taxonomy can and cannot represent, versioned category definitions, and periodic review.
Anti-Reification Mechanisms
- Modular adapter patterns (SAGEN) — encapsulate domain-specific categories in replaceable modules
- Formal equivalence proofs (LLM-QP) — demonstrate multiple plans produce identical outputs
- Explicit confidence metadata — annotate categories with epistemic status (provisional, validated, contested, convenience-only)
- Versioned categories with changelogs and sunset dates on provisional categories
Layered Taxonomic Architecture
Systems should distinguish between deep categories that define fundamental representational capacity (hard to change, chosen with commensurate care) and shallow categories that specialize for a domain (designed to be replaceable). The DSM’s monolithic structure — where deep ontological commitments are fused with shallow clinical content — is an anti-pattern that maximizes revision cost at every level.
Taxonomic Debt
By analogy with technical debt: taxonomic debt accumulates when classification decisions are made expediently and left unexamined as the system scales. Symptoms include categories that no longer match operational reality, distinctions that practitioners routinely work around, and switching costs that grow faster than the system’s value proposition. Like technical debt, the remedy is not to avoid classification decisions but to make them deliberately, document them, and budget for revision.
Taxonomic Evaluation Criteria
- Coverage. What can the taxonomy represent?
- Blind spots. What can it not represent, and what are the consequences?
- Switching costs. How embedded is it? What would revision disrupt?
- Reification risk. How likely are provisional categories to be mistaken for natural kinds?
- Hardness. Where on the soft–medium–hard spectrum, and is that the right position?
Conclusion — The Categories Are Not Scaffolding
In any system where an agent must act under uncertainty, the classification layer functions as infrastructure. It is the most consequential and least examined design decision. It determines the space of possibility. Optimization improves performance within that space. Only taxonomic revision can expand it.
Three case studies demonstrate this across domains sharing almost nothing except the structural pattern: a taxonomy that defines the representable space, becomes invisible when working, and resists revision once installed. The three predictions — constitutiveness, lock-in, invisibility through transparency — hold in all three cases, modulated by a hardness spectrum that tracks the domain’s coupling to human institutions, identity, and reflexive awareness.
The practical upshot is not that better taxonomies will solve hard problems. It is that failing to recognize taxonomies as taxonomies — failing to see them as design decisions with structural consequences, treating them as neutral descriptions rather than constitutive commitments — produces systems trapped inside spaces they cannot see the edges of.
The categories are not scaffolding. They are load-bearing walls. The first step toward building better systems is seeing the walls for what they are.