In a very fast-paced address, Kalev Leetaru, Sr. Fellow, Center for Cyber & Homeland Security, Georgetown University, illustrated many diverse uses of big data. He began by wondering what would happen if we re-imagined books as the world’s largest art gallery. Being able to walk through 500 years of images shows the power that data offers us. For example, we can see how imagery has evolved over time and mine data to see how copyright has impacted publishing over the years. We have incredible volumes of data, but they may only illustrate a tiny portion of the world.
Social media is exploding, but the data is not accessible to us because social media data from Facebook, Twitter, and similar systems is private. We are still looking at data but now it is out of a larger set than it used to be. Even half a century after data became available, we still are using keywords to search it. No single source gives us a perfect view of society.
The GDELT Project looks at how to catalog data and bring the world closer together. How can we reach across the world and preserve online journalism?
Data miners now have the ability to collaborate with publishers. What have people written about in the world over the past half century? Most data mining looks at English language sources, but that misses many sources in other languages. GDELT uses machine translation to produce data that can be mined and then tries to understand physical events. But we are more interested in emotions and how people react to events. For example, here is a graph showing the number of protest events in various countries as a function of time from which we can see events in the cycle of world history.
Here is a graph showing news coverage of Ebola in the US and other countries, which shows us that US coverage did not emerge until long after the epidemic started.
Google is doing some amazing work looking at images and cataloging objects in them, then estimating the location of the image. We have so much data and now for the first time we have tools to make sense of it. What should librarians do? They should help people find data sets. Publishers can offer researchers the ability to legally mine their data.
The Internet Archive is a unique library with large holdings of data. Librarians can help people understand the nuances of their data. Much of this work can be done on a laptop; you do not need a supercomputer! Many of the tools available today come from a wide variety of disciplines.
Don Hawkins blogs about conferences for Information Today and Against The Grain. He also maintains the Conference Calendar on the Information Today website and is the Editor of Personal Archiving: Preserving Our Digital Heritage, published by Information Today in 2013, and Co-Editor of Public Knowledge: Access and Benefits, published by Information Today in 2016. He received his Ph.D. degree from the University of California, Berkeley, and has worked in the information industry for over 45 years.