In a previous post, we talked about how the decision-making process is the act of rescuing one potentiality from a field of uncertainty and delivering it permanently into the certainty of your past. Think about that for a moment. When you are faced with a decision, there is an array of possible outcomes based on your choice. However, when you choose, you remove many of the outcomes that were possible, and you narrow down the array of possible outcomes.
In this post, we’ll explore how Claude Shannon thought about communication as the reduction of uncertainty. Remember, in the 1930s and 40s Shannon was wrestling with the challenge of how to overcome noise in the communication process. Rather than overcoming noise by shouting louder, he hypothesized that a systematic understanding of the information being communicated could effectively address the challenge.
When a listener receives a message one piece at a time, that person can use an understanding of the information being communicated to infer what piece of information may come next. With each subsequent piece of information, the message becomes clearer and more understandable.
Of course, the sequence of the words matters to the meaning of the message. There’s a scene in the movie Blame It on Rio where the character Matthew recounts his story: “One time a company I worked for transferred me to an island in the Pacific. Fantastic place. I invited my girl to visit me. I sent her a postcard every day with a single word on each card. I wrote, ‘Found a virgin paradise. It’s yours. Matthew.’ Naturally, they were delivered in the wrong order. The message she got was, ‘Found a virgin. It’s paradise. Yours, Matthew.’ I never heard from her again.”
When communicating information in sequential packages, instead of looking at it as the sequential addition of new information, Shannon looked at it as the sequential removal of ambiguity. The message evolves from being highly uncertain to highly certain. Each piece of information replaces some uncertainty with some clarity. It’s like having the pixels of an image form over time.
This is a classic example of lateral thinking. Lateral thinking leads to simple, unconventional ideas that seem obvious only after they are thought of. It stands in contrast to vertical thinking, which is the more conventional, deductive thought process. Instead of thinking about what an object is, for instance, the lateral thinker thinks about all the things the object is not.
Vertical thinking suggested the best way to overcome communication noise was to boost the signal. Lateral thinking suggested the best way to overcome noise was to reduce the signal’s uncertainty.
Shannon appropriated the term “entropy” to mean uncertainty or noise in a message. If a message is highly uncertain, or unclear, there is a high amount of entropy. As the message becomes more well understood, the amount of entropy decreases. A perfectly communicated message—that is, a message that achieves the effect desired by the communicator—has no entropy.
But how can you reduce the uncertainty of the message you’re transmitting? Shannon was intensely interested in patterns, and he believed the recognition of patterns was a key element to reducing uncertainty.
When he went to work at Bell Labs, he was originally tasked with helping with cryptography, the science of encrypting and decrypting information. World War II was in full swing, and cryptology could serve as a powerful weapon or debilitating weakness. Whichever side could successfully intercept and decrypt the communications of their enemies could gain a strategic advantage. Knowing the battle plan before the battle is fought is an incredible edge, if you can get it.
Shannon began to look at the predictability of information as a gateway to understanding information. Going back to his childhood, Shannon loved Morse code, which relies on dots, dashes, and spaces. The character following a dot has three possibilities: another dot, a dash, or a space. Likewise, the character following a dash has three possibilities: another dash, a dot, or a space. But the character following a space has only two possibilities: a dot or a dash. This may seem like a small realization, but it is a clue for a broader logic.
In the English language, as in other languages, some letters are used more frequently than others. Vowels are especially common since they occur in every word, and there are only five of them (six if you include the letter “y”). Vowels carry more information than consonants. That’s why you have to pay for a vowel in the game Wheel of Fortune, whereas the other letters are free.
The letter “e” is the most common letter in the English language. The letter “e” appears in English more than 11% of the time, on average. So, if you were to open a book at a random page and place your pencil on a random letter of a random word, you would have a better than one-in-ten chance of landing on the letter “e.”
Samuel Morse knew this when he created Morse code. That is why he assigned the simplest character to this letter: a single dot.
The letter “a” is the next most frequently used letter in English, according to the Google machine, followed by “r,” “i,” “o,” and “t,” respectively. There’s nearly a 50% chance that any letter chosen in a random English word will be one of the six most frequently used letters. This is useful information when trying to take encrypted messages and reconstruct their meaning.
The letter “h” is not all that common as far as letters go, but it is more common when predicting the letter that follows the letter “s” (such as in the word “she”) or the letter “t” (such as in the word “them”). The letter “q” is almost always followed by the letter “u.” So, in Shannon’s logic, finding a “u” after you have already found a “q” doesn’t add much new information to the word you’re trying to decrypt. In fact, you could just assume that the letter following a “q” would be a “u,” and you’d be right far more often than wrong.
Looking at the big picture, it’s clear that you could make good, educated guesses about decrypting a message if you just knew the frequency of letters in a language and the patterns that those letters form.
Just as with letters, words themselves appear in varying frequencies. In English, the word “the” is by far the most common word of all. It is so common, in fact, that I just used it three times in the previous sentence without the sentence sounding unusual! “In English, THE word THE is by far THE most common word.” The next most common words are “be,” “to,” “of,” and “and.”
This is a treasure trove of information for codebreakers like Shannon. If you frequently see the same three-word batch of symbols, you may guess that the batch represents the word “the.” That suggests the third symbol represents the letter “e,” which allows you to see if that symbol appears with a high degree of frequency elsewhere, perhaps around 11% of the time. If so, that provides more evidence that you have successfully identified that symbol as equivalent to the letter “e.” You can likely go ahead and replace “e” with that symbol everywhere you see it and do the same for the letters “t” and “h” that precede it so frequently in the batch of code you’re trying to crack.
Now you’re off to the races. Continue with the same logic, and your exercise begins to look like one giant crossword puzzle. Eventually, you’ll be able to read the whole message. Congratulations: you cracked the code, and you may well be on your way to winning the war.
Using patterns to reduce uncertainty was one of Shannon’s great breakthroughs in his theories about communication. And patterns can be deduced in any form of information. This led to Shannon’s generalized theory of information, applicable to all manner of information.
Stay tuned!