Anderson, J. and Douglas F., ‘Corpus planning: the building of the Scottish Corpus of Texts and Speech’. Paper delivered at Scottish Centre for Information on Language Teaching, University of Stirling, 20-21 June 2002
The Scottish Corpus of Texts and Speech (SCOTS) is a new EPSRC grant-funded research project jointly undertaken by the Universities of Glasgow and Edinburgh. It is the first large-scale project of its kind for Scotland, and it aims to build a large electronic collection of both written and spoken texts for the languages of Scotland.
Numerous studies have illustrated the benefits of using corpora to inform or direct linguistic analysis. The emphasis, however, is usually on using the finished end-product, i.e. the completed corpus. This paper argues that the methodological discipline of attempting to construct a well-balanced representative corpus of Scotland's languages is itself likely to teach us much about these varieties. The first phase of SCOTS will focus primarily on Scottish English and Scots, and all the numerous distinctive local varieties covered by these descriptors. However, serious problems are encountered when trying to build a representative corpus when one is relatively unsure of the overall linguistic terrain. What is a representative Scots or Scottish English text, how should intermediate varieties along the Scottish linguistic continuum be classified, and how can we ensure that the corpus accurately reflects the complex linguistic situation that is present-day Scotland? This paper considers to what extent we can plan such a corpus, and to what extent we must watch it evolve.

