Hindawi XML Corpus
In order to facilitate the use of Hindawi’s content for data mining purposes, Hindawi makes its full corpus of XML content available for download as a single .zip file. This .zip file is organized using a two-level folder structure, first by publication year, then by journal. For example, the folder called "2011" contains subfolders for any journal that has one or more published articles in 2011, and inside each of these folders are individual XML files for these articles. In addition, the downloaded .zip file contains an XML file called contents.xml, which provides an overview of all of the subfolders that exist within the main .zip file.
The content of this .zip file is updated on a daily basis, and the XML files contained within this corpus download adhere to the JATS 1.1 DTD. If you have questions about Hindawi’s XML corpus download, please contact [email protected]
Download the Hindawi Corpus