Interactive Thought Piece

The Rating Game

An interactive lesson in Goodhart's Law & RLHF

You'll rate pairs of AI responses — just pick the better one. Your preferences will train a reward model. Then you'll see what happens when an AI optimizes for your ratings.

Phase 1
Rate 8 pairs
Phase 2
Watch deployment
Phase 3
See the divergence
jakelawrence.xyz · AI Concepts Series

Related

Theme
Trust
Both demonstrate Goodhart's Law - how measurement systems corrupt authentic behavior
Theme
The Invisible Architecture
Both show how measurement systems become invisible infrastructure that shapes behavior
Theory
The New Sorting Hat
Both examine how AI classification systems create systematic biases through optimization pressure
Theme
Subgenre Survival
Both explore tension between authentic expression and external validation/success metrics