All documents are classified as belonging to one of the following genres, and we have tried where possible to include an equal number of words in all genres:
This is in addition to the subcorpus of language commentators, which consists of approximately 1 million words. There is a comprehensive list of all documents available for browsing.
While CMSW has been assembled with the aim of being used as a complete corpus for researchers, there are individual documents and collections of documents which might be of interest to those wishing a narrower focus. These include: