A Statistical Analysis of Sniffing in The Wheel of Time
Why do all of Robert Jordan's female characters "sniff" so much?
Recently, I watched the first season of The Wheel of Time, based on the book series by Robert Jordan, and it was good! Since I love reading fantasy novels and I liked the show so much, I decided to give it a read. There’s only one problem: this series is massive. It has 15 books with 800 pages each on average, totaling 4.4 million words.
I’m seven books deep in the series, and when you read ~5,500 pages of a single author’s writing, you begin to notice quirks, especially stock phrases. Characters in The Wheel of Time tend to “tug their braids,” “smooth their skirts,” or “sniff” in disapproval at a rate that noticeably exceeds a “normal” rate of braid-tugging or sniffing.
A selection of examples:
Elayne gave a sniff of disbelief. “Most people think I get off easier than the others because I am Daughter-Heir of Andor. The truth is that if anything, I catch it harder than the rest because I’m Daughter-Heir.”1
Rianna glared, but Liandrin only sniffed. “Do not be a complete fool, wilder. You are wanted alive. Dead bait will catch nothing.”
“I think Masema is going crazy,” Perrin said. Min sniffed. “With him, how can you tell?”
The glow winked out of existence, and Elaida sniffed audibly. “You have learned bad habits, wherever these two took you.”
“As well I went after you,” Lan said, and the Aes Sedai sniffed loudly.
The bosomy woman did not reply, but she sniffed.
Bosomy? Really?
Previous Literature
Others have noticed this as well. A few people (like me) with access to Python and too much time on their hands did a statistical analysis of braid-tugging, skirt-smoothing, sniffing, and arm-folding-under-breasts in The Wheel of Time. Here’s another analysis specifically focusing on braid tugging.
Strangely, the first analysis actually concludes that Robert Jordan doesn’t overuse these stock phrases. By comparing The Wheel of Time to another series famous for stock phrases (Joe Abercrombie’s The First Law trilogy), they conclude:
As you can see Glokta (a character in The First Law) licks his gums more frequently than the WoT memes combined (braid-tugging, sniffing, skirt-smoothing, etc.)… So where does this fixation on these particular memes stem from?
This statement felt wrong to me. While reading, it was extremely noticeable that the (female) characters were constantly sniffing at other people. Let’s take a closer look.
How Much Sniffing is Normal?
I think the issue with the above analysis is that their control group is wrong. In order for a language quirk to be noticeable, it must greatly exceed whatever the baseline is for “normal” language. Of course, normal language varies heavily by context, so the first thing to do is get a reasonable baseline.
What is the frequency of the word “sniff” in the (written) English language?
For this, we can use the English Corpora, a collection of about 200 billion words written in English. Using the Google Books n-grams subset, with 155 billion words, we find 138,619 instances of “sniff,” 218,515 instances of “sniffed,” and 33,401 instances of “sniffs.” That’s ~390,000 sniffs in 155 billion words, or a frequency of 1 in 400,000.
But “sniff” isn’t the only word that can be used to express disapproval. Let’s do the same analysis for “scoff” and “sneer,” and add the results to get a generous upper-bound frequency of disapproving sniffing.
I estimate that the baseline frequency of disdain by sneering, scoffing, or sniffing in literature is roughly the sum of these nine elements, 7.43 * 10^-6, or 1 in 135,000 words.
Sniffing in the Wheel of Time
The Wheel of Time has a total of 4,410,036 words, according to Google. With a few lines of Python and some regular expressions, we can parse the full text of The Wheel of Time…
import re
#open the book
bookpath = 'Books/CompleteWheelOfTime.txt'
file = open(bookpath, 'r')
content = file.read()
#define regex search pattern
pattern = '[^.!?]*sniff[^.!?]*\.'
m = re.findall(pattern, content)
for i in range(0,len(m)):
print('============================')
print(m[i].replace('\n',''))
print(len(m))
I find that the word “sniff” (including sniffs, sniffed, etc.) appears 467 times. That corresponds to a frequency of 1.06*10^-4, or 1 in 9,443 words. That’s fourteen times higher than the baseline frequency of 1 in 135,000 words, and you’re definitely going to notice that if you read it.
This is roughly the same frequency with which you might expect a very common English verb to appear, such as “sit” or “eat”
“Sit” has a frequency of 1 in 8,130. “Eat” has a frequency of 1 in 15,360. In The Wheel of Time, “sniff” has a frequency of 1 in 9,443. This is very high, but could be artificially inflated above the baseline by various errors in our estimate.
In his reddit thread, Nadinya addresses this type of naive counting analysis, calling it “bad statistics.” Nadinya does a slightly more careful analysis, using a conditional vicinity search near matching words to elucidate some context.
In my opinion, sophisticated statistics are not necessary here, because the Wheel of Time sniff frequency is fourteen times the baseline sniff frequency. Any error sources we find are extremely unlikely to alter either The Wheel of Time or baseline sniff frequency by more than a factor of ten, and so the conclusion that Robert Jordan loves sniffing way more than normal is already very robust with almost no sophistication in our analysis.
This is the power of order-of-magnitude estimates,2 a useful tool in the physicist’s toolkit. Reasonable assumptions and basic math can get you within an order of magnitude of the correct answer for a really wide range of problems. Unless you’re an astrophysicist. Like the old joke says:
Engineers work to a couple of decimal places
Physicists work to an order of magnitude
Astrophysicists work to an order of magnitude in the exponent
Sniffing Out Error Sources
That being said, let’s take a look at possible sources of bias in our data sets. The blind comparison of The Wheel of Time sniff count against a Google Books database isn’t entirely fair.
First, The Wheel of Time is a fiction book, primarily about characters interacting with each other, and the world. Their personal emotions and characteristics are highlighted by the author, which affects the frequency with which descriptive verbs are used. We need a better control group.
Secondly, we’re conflating the two uses of “sniff.”
Hopper trotted by his side, sniffing the air. As sharp as Perrin’s nose was, the wolf’s was sharper.
Here’s a case where a wolf is literally sniffing for scents — that doesn’t register as “excess sniffing” in the same way as Nynaeve constantly sniffing in disdain does.
How can we differentiate between these two cases?
Sadly, natural language processing is just not at the point where I can computationally distinguish between scent-sniffing and disdain-sniffing. Fortunately there’s only 467 cases to look at, so we can just do this manually.
A Better Control Group
The Wheel of Time series presents us with an uncommonly good control group for our analysis. Robert Jordan died while finishing his series — the last three books in the series (The Gathering Storm, Towers of Midnight, and A Memory of Light) were written by Brandon Sanderson working from Robert Jordan’s notes. This is an almost perfect control, as we can measure sniff frequency as a function of author, while holding characters and story constant.
Brandon Sanderson’s Wheel of Time books comprise 978,460 words, with 75 total sniffs, for a sniff frequency of 1 in 15,000 words. Robert Jordan’s Wheel of Time books comprise 3.4 million words, with 392 sniffs, for a sniff frequency of 1 in 8,700.
Robert Jordan sniffs nearly twice as often as Brandon Sanderson while telling the same story with the same characters.
Other control groups also make sense, for instance, we could compare Brandon Sanderson’s other books to his Wheel of Time books, to get a measure of how often Wheel of Time characters sniff when author is held constant. Perhaps this analysis can be saved for future work, as I don’t have the entire corpus of Brandon Sanderson on hand at the moment.
Manual Classification of Sniffing
It took me a bit of time, but I manually classified all instances of sniffing in The Wheel of Time into two categories: sniff-disdain and sniff-other (mostly sniff-scent).
I find 312 instances of sniff-disdain and 151 instances of sniff-other. As you might be able to tell, I didn’t do this all that carefully, evidenced by the fact that the total sniffs in the above table does not match the computationally counted sniffs (467). I dropped four sniffs somewhere! This is mostly okay — there’s always uncertainty in measurement, and I’ll gladly take 1% error to avoid going through that process again.
The true incidence of disdainful sniffing was significantly lower than the naive word-counting estimate predicted, by roughly 33%. One of the main confounding factors was the character Hurin, who is known as “the sniffer.” He alone accounted for roughly 80 instances of the word “sniff.”
Another interesting fact I found — A total of only five disdainful sniffs were given by male characters, most of the sniffing is done by women.
In total, Robert Jordan’s (overwhelmingly female) characters sniff in disapproval 312 times out of 4.4 million words, giving a sniffing frequency of roughly 1 in 14,000. This is still an order of magnitude above the baseline sniffing frequency (sniffrequency?) of 1 in 135,000, and even exceeds the frequency of a common English verb like “eat.” Combined with the evidence from the Brandon Sanderson control group, we can say with high confidence that Robert Jordan’s female characters are extremely and consistently disappointed with the people nearest to them.
Sniffs Stratified by Book
I’ve compiled the list of sniffs per book to analyze the sniffing frequency in a more granular fashion.
The enormous spike in The Great Hunt is caused by the presence of Hurin the sniffer, who accounts for ~80 sniffs. To analyze the correlation between the female leads and the amount of sniffing, we can plot the sniff count against the word count from the perspectives of Egwene, Nynaeve, and Elayne (the three female leads and sniffers-in-chief). This allows us to assess the correlation between the presence of these three women and the amount of sniffing. I’m pulling this information from the excellent statistical breakdown of character points-of-view on the Wheel of Time Wiki.3
With the exception of The Great Hunt, the correlation is quite good indeed. We can quantify this correlation with the Pearson correlation coefficient r, which ranges between -1 (perfect negative correlation) and 1 (perfect positive correlation). Including The Great Hunt, we find r = 0.35, corresponding to a p-value of p = 0.2, indicating relatively weak correlation. However, if we exclude Hurin’s many sniffs in The Great Hunt from the dataset, we find r = 0.83, with a p-value of p = 0.0001, an extremely strong correlation. This is a great example of how careful data analysis can reveal strong correlations buried by single outlier points.
Collecting all of our data together draws a very clear picture. Robert Jordan’s female characters sniff a lot — the sniffing frequency is well above anything you expect to see in normal language, and is thus somewhat jarring to read.
We shouldn’t be too hard on Robert Jordan, though. Indeed, let he who is without sniff cast the first stone! I am horribly guilty of over-sniffing. In this article, roughly 1 word in 25 is “sniff,” making me nearly 400 times worse than Robert Jordan.
The Dragon Reborn, by Robert Jordan
One of my favorite textbooks of all time: Order-of-Magnitude Physics: Understanding the World with Dimensional Analysis, Educated Guesswork, and White Lies, by Goldreich, Mahajan, and Phinney. http://www.inference.org.uk/sanjoy/oom/book-a4.pdf
Wheel of Time Wiki Statistical Analysis Page. https://wot.fandom.com/wiki/Statistical_analysis
Awesome work! Although I noticed more lip licking than sniffing
How did you calculate the number of words from the female leads’ perspectives?