Explaining Information Entropy to a 5 year old

Suppose there are 8 numbers. I choose one, but you don’t know which number it is. Your task is to guess which number. You can ask yes or no questions. What is the minimum number of questions you would need to ask to guess the number?

1
2
3
4
5
6
7
8

All have an equal chance of occurring (P(any number) = 1/8).

What would be your approach for asking questions?

Start at the mid-way point and asking if the number is less than 5.  Suppose the answer is yes. Now you know the number must be

1
2
3
4

Do the same thing and ask if the number is less than 3. Suppose the answer is yes again. Now you know the number must be

1
2

Now you ask whether the number is 1, and if the answer is no, then you know the number is 2. You have guessed the number in 3 questions.

3 is the minimum number of bits you would need to use to encode the probability distribution of these 8 numbers. This algorithm is at the core of how we are all able to send emails, pictures, and videos so quickly now. Before the father of information theory Claude Shannon had invented this algorithm, it was unthinkable to send a picture — even a voice message, across the globe. There was simply no efficient way to send it. Nobody knew how many bytes it would take to encode the message, or even how to encode the message.

Next question: What if the numbers did not have an equal chance of occurring?