How an LLM "Learns"
The Colour Ball Analogy
Step 1: The Setup
Someone passes in
a RED ball
???
COLOUR BLIND!
Can't see colours at all
Their collection:
(they only see numbers)
1
2
3
(red)
(blue)
(green)
Their job: When passed a ball,
pick the matching one from
their collection to pass forward.
But they can't see any colours!
Step 2: First Try
"I'll try #2"
2
2
actually blue!
WRONG!
No reward
Step 3: Second Try
"OK, try #3"
3
3
actually green!
WRONG!
No reward
Step 4: Third Try
"Last one... #1!"
1
1
it's RED!
CORRECT!
💰
Reward!
Step 5: Trained!
"That input = always
pass ball #1"
THE MODEL HAS LEARNED:
"When I get THAT input →
always output #1"
It never "sees" red. It just knows which number gets rewarded.
That's exactly how an LLM works - pattern matching, not understanding.