Anderson, W., and Beavan, D., ‘Capturing Identity: The SCOTS Corpus’. Poster presented at British Association for Applied Linguistics Annual Meeting, University of Bristol, 15-17 September 2005
The AHRC-funded Scottish Corpus Of Texts & Speech (SCOTS) Project at Glasgow University aims to make available over the Internet a 4 million-word multimedia corpus of texts in Scots and Scottish English. Twenty percent of this final total will comprise spoken language, made up of a combination of audio and video material: this is available as orthographic transcriptions synchronised with the source audio or video files. Versions of SCOTS have been accessible on the Internet since November 2004, and regular additions are made to the Corpus as texts are processed and functionality is improved. While the Corpus is a valuable resource for research, our target users also include the general public, and this has important implications for the nature of the Corpus and our website.
This poster will consider the theoretical and practical issues involved in building a publicly-available general corpus such as SCOTS. These include the difficulties of defining a total population of texts; the representation of language varieties and text types, to which is linked the issue of availability and copyright permission; non-standard written language and spelling variation; web accessibility; and the provision of tools for linguistic analysis. Once such hurdles have been overcome, the SCOTS Corpus will increase its significance as a resource for linguistic and cultural study, and will provide a valuable snapshot of the current-day linguistic situation in Scotland.

