Cleaning, Organizing, and Uniting Linguistic Databases (the COULD project)

Abstract

The COULD project has 5 goals. (1) It seeks to transfer existing linguistic data from a variety of different formats into a universal format that will allow linguists to combine and share information, not only with other linguists but also with the public at large. (2) The project will build applications that automatically correct errors, draw attention to inconsistencies, and fill gaps in the data. (3) These automated mechanisms will provide new tools to detect patterns that are not obvious when looking at smaller databases. (4) The project seeks to make the vast amounts of linguistic data, currently only being used by researchers, available to second language learners by developing search algorithms that facilitate lesson creation. (5) The project will make data collection easier and thus make language preservation and documentation less dependent on experts. Communities trying to revive endangered languages will benefit directly from this project.

Principal Investigators

Maria Polinsky, Harvard University, US, NSF
Alan Bale, Concordia University, CAN, SSHRC