Recent years have brought significant changes to the political situation in Scotland. This new political situation has been accompanied by a resurgence of interest in the languages and culture of Scotland.

Advances in computer technology have made it possible to store and analyse large quantities of information in ways which were previously unthinkable. As a result, in recent years much research in the Humanities has focused on the building of large text archives and corpora. Such resources offer exciting opportunities to study language on a broad scale and with a precision which would otherwise be impossible.

The Scottish Corpora project has created large electronic corpora of written and spoken texts for the languages of Scotland. The Scottish Corpus of Texts & Speech (SCOTS) has been online since November 2004, and, after a number of updates and additions, has reached a total of nearly 4.6 million words of text, with audio recordings to accompany many of the spoken texts. A sister resource, the Corpus of Modern Scottish Writing, was launched in 2010, and now comprises 5.4 million words of written text with accompanying images.

Together, the Scottish corpora allow those interested in Scotland’s linguistic diversity, and in Scottish culture and identity, to investigate the languages of Scotland in new ways, and to address the gap which presently exists in our knowledge of these. The resources also preserve information on these languages for future generations.

The Scottish Corpora have benefited from financial support from two sources. The earliest phase of the project (2002-2004) was funded by an Engineering and Physical Sciences Research Council (EPSRC) grant in a joint project with the Language Technology Group, University of Edinburgh. The Scottish Corpus of Texts & Speech and Corpus of Modern Scottish Writing resources were funded by the Arts and Humanities Research Council (2004-2007, 2007-2010).

