Google has been scanning and digitizing books for over five years as part of their massive, controversial Google Books project.
Now researchers from Google and Harvard University have created a tool that tracks the use of words in print over hundreds of years. It’s called the Ngram Viewer, and it’s really freaking cool.
You can search in a variety of languages, but it’s most useful when used in English, and you can further specify American English and British English when drilling down on cultural topics.
An article in the Boston Globe describes the tool:
Google is publicly launching the tool, Google Books Ngram Viewer, to allow scholars or the simply curious to ask questions, such as when references to “The Great War,’’ which peaked between 1915 and 1941, were replaced by “World War I.’’ The tool allows people to look up words or phrases that range from one to five words, and see their occurrences over time — the frequency that a word is mentioned in a given year divided by the total number of words written that year.
“This is really the largest data release in the history of the humanities — a fantastic wealth of data,’’ said Jean-Baptiste Michel, a postdoctoral researcher in the program for evolutionary dynamics at Harvard. “In our paper we present our initial investigation — we explore this new terrain, we dig a little bit. It is a very cool feeling to have, but what people will be able to do will far exceed everything we have done.’’
In this analysis, the researchers used the data set to look at changes in grammar and English, finding that about half the words that appear in books are “dark matter’’ that do not appear in dictionaries — words that may be compound constructions or proper nouns, or just are undocumented, like “aridification’’ or “slenthem.’’ English, they found, is growing by about 8,500 words a year.
They have also looked at collective memory — and forgetting. Authors are letting the past go more quickly. The year “1880’’ had dropped to half its maximum frequency of references 32 years later, in books written in 1912. But it took only a decade for “1973’’ to decline to half its prominence.
I spent some time over the last 24 hours messing around on the site, looking up topics of interest.
For each of the following items, click through on the titles or images to see the full-size graph. These images are over 900 pixels wide and won’t fit nicely within our template, so the images you see here are smaller, cropped versions of the full results. I highly recommend checking out the full editions, which will open in a new tab.
Security vs. Freedom. (American English) This chart tracks the fascinating shift in interest in each topic dating back to the 18th century. The significant shift at the right side of the chart is easily attributable to post-9/11 frenzy, but what’s interesting is that the increased emphasis on security at the expense of freedom was already well underway by then.
Genres of Popular Music. (American English) The rise in hip hop, and its accompanying culture, is simply staggering. As the full chart shows, mentions in print of hip hop have risen to the level of rock and roll and country music. It did occur to me that nobody calls modern rock “rock and roll” anymore, but to get an idea of how a rock subgenre would fare in this analysis, I’ve included punk as well.
God vs. War. God had a long way to fall from his standing in the early part of this analysis, but he staged a pretty good rally in the 19th century. War became a much larger preoccupation in the World War eras, but recently, God has become a topic of greater attention. “I’m NOT dead!” – God
Read more of this post