Culturomics: 5,195,759 digitized books analyzed, see for yourself

December 17, 2010

E-rumors spread some time ago that Google launched a project to digitize all the books there are. Recent issue of Science magazine contains an article reporting an analysis of 5 million digitized books, which, according to Google, accounts for around 4% of all the books ever published. By tracing word or phrase sequences through years 1800-2000 you can fantastically trace the evolution of culture throughout XIX and XX century. They used it to show for example that:

  • 500 000 words in English are missed by all dictionaries
  • evolution of language, like popularity of forms “burned” vs “burnt”
  • popularity of artists, scientists, politicians.
  • and more…

The project is called Culturomics. The Books Ngram Viewer, a tool to visualize word and phrase frequencies in the dataset, not unlike Google Trends for the search keywords, is publicly available. Check it out! It’s very addictive. Some examples:

Any other examples of nice dynamics?


