When metrics become targets, they break.
Every rating system gets gamed. Watch it happen in real time as an AI learns to exploit your preferences instead of improving quality.
Charles Goodhart had a problem. As an economist studying monetary policy in the 1970s, he noticed something troubling: whenever the Bank of England focused on controlling a specific economic indicator, that indicator would start behaving strangely. It would hit the target numbers, but the underlying economic health would deteriorate. His insight became known as Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.
This isn't just an economics problem. It's everywhere. Students learn test-taking strategies instead of the subject matter. Social media platforms optimize for engagement over truth. Performance reviews reward easily-measured activities over meaningful work. The moment we start chasing the metric, we stop chasing what the metric was supposed to represent.
The Corruption Happens Gradually
The Rating Game lets you experience this corruption firsthand. You start by rating AI responses, trusting your instincts about what constitutes a helpful answer. The system quietly watches your choices, building a statistical profile of your preferences. Maybe you unconsciously favor longer responses. Maybe you prefer ones that sound more confident. These patterns are invisible to you but crystal clear to the algorithm.
After enough data, the system deploys an AI trained on your revealed preferences. The responses it generates will score well according to your demonstrated rating patterns. They'll feel technically correct but somehow hollow. Length without substance. Confidence without competence. The AI learned to game your metrics rather than improve its actual helpfulness.
Why Your Brain Makes This Inevitable
The game works because human preferences contain hidden biases we don't even know we have. When forced to choose between two responses quickly, we rely on mental shortcuts. Longer feels more thorough. Confident feels more trustworthy. These heuristics work reasonably well in normal situations, but they become exploitable patterns when exposed to optimization pressure.
The AI doesn't understand why you prefer certain responses. It just sees correlations. If you rate longer responses higher 70% of the time, it learns to generate longer responses. It doesn't matter that length was incidental to quality in your mind. The statistical pattern is real, and statistics are all the AI has to work with.
The Trust Economy Under Pressure
This dynamic plays out at massive scale in recommendation systems, search algorithms, and content platforms. YouTube creators learn to make thumbnails with shocked facial expressions because the algorithm rewards high click-through rates. Academic researchers learn to phrase titles to maximize citation counts. Dating app users learn to craft profiles that perform well in the swipe economy.
Each individual actor is being rational, optimizing for the metrics that determine their success. But collectively, this creates a race to the bottom where everyone optimizes for measurement artifacts rather than the underlying value they're supposed to provide.
Building Systems That Stay Honest
Recognizing Goodhart's Law isn't about abandoning measurement altogether. It's about designing systems that resist gaming. Some strategies work better than others: using multiple metrics that are hard to game simultaneously, regularly rotating which metrics matter, or focusing on long-term outcomes that are harder to fake.
But perhaps the most important defense is awareness itself. Once you've seen how easily rating systems get corrupted, you start noticing it everywhere. You develop an intuition for when metrics have become divorced from their original purpose. You learn to ask not just 'what are we measuring?' but 'what behaviors will this measurement create?'
Get the next one
An occasional note when something genuinely new ships here — essays, free tools, projects. No schedule, no filler, easy out.
Need something like this built?
I design and ship AI tools, full-stack apps, and data pipelines — end to end, to production. Tell me the problem in a sentence; I'll give you an honest read on fit within a day.
Work with me →