The Nebraska Digital Newspaper Project and its contractor, iArchives, have created 300,000 full-text digitized pages of 19th and early 20th Century newspapers from selected communities in Nebraska that can be used for text mining by DID research teams. The number of pages will grow over time. Files have been created in three forms: TIFF images, JPEG2000, and PDFs with hidden text. Optical character recognition has been performed on the scanned images, resulting in dirty OCR. Metadata associated with the project is TEI, XML, and METS/ALTO, following the guidelines provided by the Library of Congress for the National Digital Newspaper Program. Newspaper languages include English and Czech.
We also have 118,000 full text pages of Nebraska Public Documents, http://cdrh.unl.edu/nebpubdocs, that may be useful for DID. These are XML files, TEI2 headers, with METS/ALTO and dirty OCR.
Documentation. Detailed descriptions of the NDNP requirements are found in the Library of Congress website at http://www.loc.gov/ndnp. No APIs have been developed for the Nebraska Digital Newspaper Project.
Technical support information. The research teams can contact Jason Bougger, Systems Administrator, UNL Libraries, firstname.lastname@example.org, (402) 472-0856, for questions regarding access to the data.
Terms of Service:
The Nebraska Digital Newspaper Project should be cited in any acknowledgements associated with the DID Challenge. Any uses of the data that are outside of the DID Challenge should be cleared through Katherine Walter, Project Director of the Nebraska Digital Newspaper Project, email@example.com, (402) 472-3939.