The University of Florida Digital Collections (UFDC) hosts local and international collections, housing over 8 million pages of all material types (books, archival documents, newspapers, photographs, audio, video, museum objects, data sets, maps, etc.) in many languages. A full list of collections that are clickable to descriptions, with statistics for each, is available here: http://ufdc.ufl.edu/stats/usage/history
Selected, large collections are:
• Digital Library of the Caribbean: 79,313 items and 1,793,735 pages
• Florida Digital Newspaper Library: 88,614 issues and 1,386,668 pages
• Baldwin Library of Historical Children's Literature: 6,316 items and 941,350 pages
The Digital Library of the Caribbean contains historic through current materials in multiple languages (primarily in English, Spanish, and French). The Florida Digital Newspaper Library includes historic through current newspapers. The Baldwin collection contains 19th century children's literature. All collections and items are openly accessible for use and for datamining.
The SobekCM system powering the UF Digital Collections supports OAI-PMH, searches and browses as XML, and a JSON interface to images and raw text (in use for several iPhone Apps). Extensive documentation is available here: http://ufdc.ufl.edu/sobekcm/harvesting and on the main SobekCM pages: http://ufdc.ufl.edu/sobekcm/