Culturomics: 5,195,759 digitized books analyzed, see for yourself
E-rumors spread some time ago that Google launched a project to digitize all the books there are. Recent issue of Science magazine contains an article reporting an analysis of 5 million digitized books, which, according to Google, accounts for around 4% of all the books ever published. By tracing word or phrase sequences through years 1800-2000 you can fantastically trace the evolution of culture throughout XIX and XX century. They used it to show for example that:
- 500 000 words in English are missed by all dictionaries
- evolution of language, like popularity of forms “burned” vs “burnt”
- popularity of artists, scientists, politicians.
- and more…
The project is called Culturomics. The Books Ngram Viewer, a tool to visualize word and phrase frequencies in the dataset, not unlike Google Trends for the search keywords, is publicly available. Check it out! It’s very addictive. Some examples:
Any other examples of nice dynamics?