Google Books Ngram Viewer is Insanely Addictive
December 17, 2010 1 Comment
Google has been scanning and digitizing books for over five years as part of their massive, controversial Google Books project.
Now researchers from Google and Harvard University have created a tool that tracks the use of words in print over hundreds of years. It’s called the Ngram Viewer, and it’s really freaking cool.
You can search in a variety of languages, but it’s most useful when used in English, and you can further specify American English and British English when drilling down on cultural topics.
An article in the Boston Globe describes the tool:
Google is publicly launching the tool, Google Books Ngram Viewer, to allow scholars or the simply curious to ask questions, such as when references to “The Great War,’’ which peaked between 1915 and 1941, were replaced by “World War I.’’ The tool allows people to look up words or phrases that range from one to five words, and see their occurrences over time — the frequency that a word is mentioned in a given year divided by the total number of words written that year.
“This is really the largest data release in the history of the humanities — a fantastic wealth of data,’’ said Jean-Baptiste Michel, a postdoctoral researcher in the program for evolutionary dynamics at Harvard. “In our paper we present our initial investigation — we explore this new terrain, we dig a little bit. It is a very cool feeling to have, but what people will be able to do will far exceed everything we have done.’’
In this analysis, the researchers used the data set to look at changes in grammar and English, finding that about half the words that appear in books are “dark matter’’ that do not appear in dictionaries — words that may be compound constructions or proper nouns, or just are undocumented, like “aridification’’ or “slenthem.’’ English, they found, is growing by about 8,500 words a year.
They have also looked at collective memory — and forgetting. Authors are letting the past go more quickly. The year “1880’’ had dropped to half its maximum frequency of references 32 years later, in books written in 1912. But it took only a decade for “1973’’ to decline to half its prominence.
I spent some time over the last 24 hours messing around on the site, looking up topics of interest.
For each of the following items, click through on the titles or images to see the full-size graph. These images are over 900 pixels wide and won’t fit nicely within our template, so the images you see here are smaller, cropped versions of the full results. I highly recommend checking out the full editions, which will open in a new tab.
Security vs. Freedom. (American English) This chart tracks the fascinating shift in interest in each topic dating back to the 18th century. The significant shift at the right side of the chart is easily attributable to post-9/11 frenzy, but what’s interesting is that the increased emphasis on security at the expense of freedom was already well underway by then.
Genres of Popular Music. (American English) The rise in hip hop, and its accompanying culture, is simply staggering. As the full chart shows, mentions in print of hip hop have risen to the level of rock and roll and country music. It did occur to me that nobody calls modern rock “rock and roll” anymore, but to get an idea of how a rock subgenre would fare in this analysis, I’ve included punk as well.
God vs. War. God had a long way to fall from his standing in the early part of this analysis, but he staged a pretty good rally in the 19th century. War became a much larger preoccupation in the World War eras, but recently, God has become a topic of greater attention. “I’m NOT dead!” – God
Values. What’s of greatest interest to English writers — religion, science, politics, or education? The answer turns out to be, different things at different times. Education tracked pretty closely with science for a long time, but around 1900 became a topic of interest all its own, ultimately overwhelming the others. When this subject is examined using only American sources, politics is consistently last; using all English sources, it stages a temporary rally over religion.
Rights. One thing that I find really cool about these analyses is the ability to pinpoint when a certain concept became propagated in people’s minds. At one point, suffrage (it is implied, for those other than white, landowning males) was the overriding concern, and scarcely anyone had even considered the ideas of feminism and human rights. Nowadays, human rights is mentioned far more often than the others. It’s also interesting to track the rise of privacy concerns, a topic that was once considered almost irrelevant in our culture.
Sports. (American English) I focused on American sources to eliminate the football/soccer mixup. This one is pretty cool, if you ask me. Granted, this tracks mentions of each sport in print, as opposed to attendance or earnings or what have you, but I think it’s a very interesting look at each game’s hold upon the zeitgeist. Baseball fans will be disappointed when they click through to the full chart and see that the game’s 1990s rally over football did not last. And as you can see, soccer is now a far more popular topic for American writers than hockey, while golf maintains a surprisingly strong hold on our imaginations.
One cool thing I found is that you can track not just words, but names as well.
Karl Marx vs. Abraham Lincoln. While they’re not directly related, this is an interesting set of famous names to compare. Honest Abe had a pretty healthy head start, but for a period of about sixty years, interest in the Great Emancipator dwindled while the ideological underpinnings of the Cold War attracted increasing attention. You can see the trend sharply reverse itself around 1980, when Communism began to crumble and interest in Lincoln (coincidentally?) began to grow once again.
Famous Names in Science. I chose notable names from relatively recent times, so we can see each of them burst onto the scene. The challenge here was also to find scientists with sufficient fame to register — even someone as eminent as Niels Bohr would be just a little line down at the bottom of this chart.
Icons. It’s interesting to note that Michael Jackson was basically a niche figure for the first ten years of his fame, while Elvis and Warhol rose much faster, but once Michael hit his stride, mentions of him in print rose rapidly. Warhol was bigger than Elvis for fifteen minutes there. While the King has slipped a little bit from his peak, however, he still remains the celebrity icon by which all others are judged.
Auteurs. I don’t know about you, but I was stunned to see the size of Woody Allen’s impact in this data. He even gives Orson Welles a run for his money. Ingmar Bergman was #2 on the English-speaking world’s auteur countdown for a while there, but lately he has faded a bit. And I guess it’s a good thing for serious cinephiles that Spielberg isn’t quite the presence on this chart that one might expect from looking at box-office receipts. All in all, this is a pretty interesting chart.
American Generals of WWI/WWII/Korea. It took Eisenhower becoming President to even begin to threaten MacArthur’s preeminence in this discussion. Granted, the dude waged war so many times that he’s all over the history books, but I was still surprised to see how thoroughly he ranked the heroes of the European theater in World War II. Another thing I found interesting was that Omar Bradley was more notable than Patton for a time, but as the decades have gone on, he has been eclipsed by his more colorful comrade.
Squillionaires. This is mostly cool for the rapid, late ascendancy of Bill Gates into the front rank of the uber-wealthy, but in the excerpt above, you can see how Morgan became for a time the most notable of the super-rich 19th century barons. I also included Jay Gould, notorious “robber baron” railroad developer who has become far less well known than the others, which is a bit of a shame, because he was a grade-A dick who left behind some entertaining tales.
Global Bad Guys. How pissed would Stalin be if he found out that he never achieved the status of Most Remarked-Upon Villain in the eyes of the English-speaking world? (I went with “Chairman Mao” because the search engine has some trouble with hyphenates, and I think the results look pretty legit.)
Communications Devices. I could go on with this all day, but this seems like a good place to wrap things up for now. It’s interesting how an invention like the telegraph, even when obsolete, can still hang on for a while as an item in the history books. Meanwhile, the Internet, which has absorbed all the communications functions that came before it, has quickly skyrocketed to the top of the charts.
Want to see more search results? I highly recommend that you check out the Google Books Ngram Viewer, but I must warn you: make sure you have enough free time to spare before you go down this rabbit hole!
Finally: this is a great little Easter Egg.