Round 1

Original Announcement of the Round One (2009) Competition

The Digging into Data Challenge is an international grant competition sponsored by four leading research agencies, the Joint Information Systems Committee (JISC) from the United Kingdom, the National Endowment for the Humanities (NEH) from the United States, the National Science Foundation (NSF) from the United States, and the Social Sciences and Humanities Research Council (SSHRC) from Canada. 

What is the "challenge" we speak of?  The idea behind the Digging into Data Challenge is to answer the question "what do you do with a million books?"  Or a million pages of newspaper? Or a million photographs of artwork?  That is, how does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data -- far more than they could read in a lifetime -- what does that mean for research?  

Applicants will form international teams from at least two of the participating countries.  Winning teams will receive grants from two or more of the funding agencies and, one year later, will be invited to show off their work at a special conference. Our hope is that these projects will serve as exemplars to the field.

The advent of what has been called “data-driven inquiry” or “cyberscholarship” has changed the nature of inquiry across many disciplines, including the sciences and humanities, revealing new opportunities for interdisciplinary collaboration on problems of common interest.  The creation of vast quantities of Internet accessible digital data and the development of techniques for large-scale data analysis and visualization have led to remarkable new discoveries in genetics, astronomy, and other fields, and—importantly—connections between academic disciplinary areas.  New techniques of large-scale data analysis allow researchers to discover relationships, detect discrepancies, and perform computations on data sets that are so large that they can be processed only using computing resources and computational methods developed and made economically affordable within the past few years.  With books, newspapers, journals, films, artworks, and sound recordings being digitized on a massive scale, it is possible to apply data analysis techniques to large collections of diverse cultural heritage resources as well as scientific data.  How might these techniques help scholars use these materials to ask new questions about and gain new insights into our world?  To encourage innovative approaches to this question, four international research organizations are organizing a joint grant competition to focus the attention of the social science and humanities research communities on large-scale data analysis and its potential application to a wide range of scholarly resources.

The goals of the initiative are

  • to promote the development and deployment of innovative research techniques in large-scale data analysis;

  • to foster interdisciplinary collaboration among scholars in the humanities, social sciences, computer sciences, information sciences, and other fields, around questions of text and data analysis;

  • to promote international collaboration; and

  • to work with data repositories that hold large digital collections to ensure efficient access to these materials for research.

If you are interested in taking up this challenge, please read the RFP and addenda available on this page. 

Original Press Releases

Press Releases About the Launch off Digging into Data Challenge (January 2009)


Press Releases about Awardees (December 2009)


Speech by NEH Chairman Jim Leach at DiD awards ceremony.

Round One Conference

At the end of each round of funding, the grantees gather to present their work. The first round conference was held June 9 - 10, 2011 in Washington, DC. To access the papers given at the conference and read profiles of the speakers, please see the Digging Round One Conference page.


2009 Award Recipients:

This project will pursue research using advanced computational techniques to explore humanities themes related to the authorship of large collections of cultural heritage materials, namely 15th century manuscripts, 17th and 18th century maps, and 19th and 20th century quilts.


This project will focus on a body of 53,000 18th-century letters, and analyze the degree to which the effects of the Enlightenment can be observed in the letters of people of various occupations.


This project will harvest audio and transcribed data from podcasts, news broadcasts, public and educational lectures and other sources to create a massive corpus of speech. Tools will then be developed to analyze the different uses of prosody (rhythm, stress and intonation) within spoken communication.


This project focuses on large scale data analysis of audio -- specifically the spoken word.  This project will create tools to enable rapid and flexible access to over 9,000 hours of spoken audio files, containing a wide variety of speech, drawn from some of the leading British and American spoken word corpora, allowing for new kinds of linguistic analysis.


This project will integrate a vast collection of textual, geographical and numerical data to allow for the visual presentation of the railroads and its impact on society over time, concentrating initially on the Great Plains and Northeast United States.


SALAMI (Structural Analysis of Large Amounts of Music Information) is an innovative and ambitious computational musicology project. Our computational approach, combined with the huge volume of data now available from such source as the Internet Archive, will a) deliver a very substantive corpus of musical analyses in a common framework for use by music scholars, students and beyond; and, b) establish a methodology and tooling which will enable others to add to this in the future and to broaden the application of the techniques we establish.


The creation of a framework to produce "dynamic variorum" editions of classics texts that enable the reader to automatically link not only to variant editions but also to relevant citations, quotations, people, and places that are found in a digital library of over one million primary and secondary source texts.


This project will create an intellectual exemplar for the role of data mining in an important historical discipline – the history of crime – and illustrate how the tools of digital humanities can be used to wrest new knowledge from one of the largest humanities data sets currently available: the Old Bailey Online.