Interactive Thought Piece

The Rating Game

An interactive lesson in Goodhart's Law & RLHF

You'll rate pairs of AI responses — just pick the better one. Your preferences will train a reward model. Then you'll see what happens when an AI optimizes for your ratings.

Phase 1
Rate 8 pairs
Phase 2
Watch deployment
Phase 3
See the divergence
jakelawrence.xyz · AI Concepts Series