Original Announcement of the Round Three (2013) Competition
On behalf of ten research funders representing Canada, the Netherlands, the United Kingdom, and the United States, we invite you to apply for Round Three of the Digging into Data Challenge.
Now going into the third round of the competition, the Digging into Data Challenge has funded a wide variety of projects that explore how computationally intensive research methods can be used to ask new questions about and gain new insights into our world. To encourage innovative research from across the globe, Digging into Data is sponsored by ten international research funding organizations that are working together to focus the attention of the social sciences, humanities, library, archival, information, computer, mathematical, and statistical science communities on large-scale data analysis and its potential applications.
The Digging into Data Challenge aims to address how "big data" changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials available for research in the humanities and the social sciences--ranging from digitized books, newspapers, and music to information generated by Internet-based activities and mobile communications, administrative data from public agencies, and customer databases from private sector organizations-—what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st-century scholarship.
Applicants will form international teams from at least two of the participating countries. Winning teams will receive grants from two or more of the funding agencies and, two years later, will be invited to show off their work at a special conference sponsored by the ten funders.
Let's get digging.
Original Press Releases
Press releases about the winners of Round Three:
Jisc NEH IMLS AHRC SSHRC NSF NWO SSHRC
Press releases announcing kickoff of Round Three:
AHRC CFI ESRC IMLS Jisc NEH NSERC NSF NWO NLeSC SSHRC
Round Three Conference
At the end of each round of funding, the grantees gather to present their work. The Round Three conference will be held on Glasgow, UK on January 27 - 28, 2016. For more information, see the Round Three Conference page.
Press:
- The Hill Times, July 30, 2018. "University of Toronto project brings parliamentary records into the 21st century"
- The Atlantic, November 17, 2014. "Things That Make You Go “Um."
- The Conversation, April 3, 2014. "Data mining uncovers 19th century Britain’s fat habit."
- Phys.org, March 26, 2014 "Trading archives chart how Britain's taste for tea grew."
- The Dish, Stanford University, March 24, 2014. "Stanford team receives grant to support digital analysis of medieval manuscripts."
- The Telegraph, March 3, 2014. "Linguistic researchers begin hunt for the next 'selfie'."
- UChicago News, February 24, 2014. "National Endowment for the Humanities supports digital project focused on 18th-century intellectual history."
- UKAuthority.com, February 13, 2014. "Parliamentary 'big data' project "could transform" political research"
- Biodiversity Heritage Library, January 28, 2014. "Mining Biodiversity (MiBIO): innovative computational techniques to mine BHL texts"
- School for Advanced Study, University of London, January 22, 2014. "IHR project wins ‘big data’ funding to help historians access 200 years’ worth of international parliamentary proceedings."
- University of U of Saskatchewan, January 22, 2014. "UK, Netherlands researchers to dig into archeological data for hidden treasures"
- Dalhousie University, January 22, 2014. "Natural history for the digital age"
- Concordia University, January 15, 2014, "$200,000 awarded to next-generation media data analysis"
- McGill University, January 15, 2014, "Taking on the 'big data' challenge"
2013 Award Recipients:
The Automating Data Extraction from Chinese Texts Project aims to provide humanists and social scientists with a means of transforming 2200 years of Chinese texts into structured data. The project will fully develop an open-source platform that allows its users to apply sophisticated text-mining techniques, hitherto the domain of information scientists, to a wide variety of historical and literary texts.
The COULD project has 5 goals. (1) It seeks to transfer existing linguistic data from a variety of different formats into a universal format that will allow linguists to combine and share information, not only with other linguists but also with the public at large. (2) The project will build applications that automatically correct errors, draw attention to inconsistencies, and fill gaps in the data.
Recent scholarship has demonstrated that the various practices associated with Early Modern “commonplacing” -- the extraction and organization of quotations and other passages for later recall and reuse--were highly effective strategies for dealing with the perceived "information overload" of the period.
Teams from the UK, Canada and the Netherlands will investigate how we can use interactive systems design in conjunction with image processing and text mining techniques to help archaeologists find, organise and analyse the thousands of image and document resources available to them for answering archaeology research questions.
This project brings together political scientists, historians and computational linguists, from Canada, The Netherlands and the UK, to enable large-scale analysis of the proceedings of three parliaments, from c.1800 to the present day. This data reflects any event of significance over the past 200 years, and will be enhanced during the course of the project to shed light on developments across different nations, cultures and systems of political representation.
This project will develop cross-linguistic annotation protocols for exploring the content of sign language video datasets. The key progress lies in a) standardised lemmatisation protocols for lexicalised signs, and b) protocols for annotating partly-lexical and non-lexical (including gestural) elements.
In this project, psychology and management scholars from the United States and Canada will collaborate with an expert in online research and classification methods to devise a web application that will (i) enable the encoding of millions of individual findings in a multidisciplinary social science research domain, (ii) facilitate complex analyses, and (iii) provide open access to members of the scholar community and the general public.
This project undertakes the cross-cultural study of literary networks in a global context, ranging from post-classical Islamic philosophy to the European Enlightenment. Integrating new image-processing techniques with social network analysis, we examine how different cultural epochs are characterized by unique networks of intellectual exchange.
This project takes a radically novel approach to the problem of measuring and visualizing differences among legal systems: it focuses on machine coding of internal references in codes and laws. Internal referencing is an inherent characteristic of codes. Already the Code of Hammurabi, almost 3800 years ago, was structured as a numbered list of laws with at least one cross-reference.
The Mining Biodiversity project aims to transform the Biodiversity Heritage Library into a next-generation social digital library resource to facilitate the study and discussion (via social media integration) of legacy science documents on biodiversity by a worldwide community and to raise awareness of the changes in biodiversity over time in the general public. The project will integrate novel text mining methods, visualisation, crowdsourcing and social media into the BHL to provide a semantic search system.
Social scientists have used agent-based models (ABMs) to explore the interaction and feedbacks among social agents and their environments. The bottom-up structure of ABMs enables simulation and investigation of complex systems and their emergent behavior with a high level of detail; however the stochastic nature and potential combinations of parameters of such models create large non-linear multidimensional “big data,” which are difficult to analyze using traditional statistical methods.
Commercial media companies have embraced computational analytics to study discussions of media content across social media data streams. Data mining companies identify actors and TV shows that are “trending” in global popularity, along with more granular analyses of regional tastes, social networks, and discourse. We propose to apply a similar methodology toward the study of film and media history.
Our team proposes to study papyrus documents from Egypt found in trash heaps: scraps giving us rich evidence of human activity in the ancient Mediterranean. They allow us to retrieve lost poetry, new gospels, and everyday writings: letters, contracts, census returns, homilies, recipes. Half a million fragments await study in the Oxyrhynchus collection alone.
The proposed research aims to analyze contemporary twitter data for the UK and USA for regional variation in linguistic forms and link the patterns of variation with migration in both countries.