How an LLM "Learns" The Colour Ball Analogy Step 1: The Setup Someone passes in a RED ball ??? COLOUR BLIND! Can't see colours at all Their collection: (they only see numbers) 1 2 3 (red) (blue) (green) Their job: When passed a ball, pick the matching one from their collection to pass forward. But they can't see any colours! Step 2: First Try "I'll try #2" 2 2 actually blue! WRONG! No reward Step 3: Second Try "OK, try #3" 3 3 actually green! WRONG! No reward Step 4: Third Try "Last one... #1!" 1 1 it's RED! CORRECT! 💰 Reward! Step 5: Trained! "That input = always pass ball #1" THE MODEL HAS LEARNED: "When I get THAT input → always output #1" It never "sees" red. It just knows which number gets rewarded. That's exactly how an LLM works - pattern matching, not understanding.