Anderson, W., ‘The Scottish Corpus of Texts and Speech’. Talk delivered to the Scottish Society of the Institute of Linguists, Perth, 21 May 2005
The Scottish Corpus of Texts and Speech was first launched on the Web on St Andrew’s Day 2004, and is updated regularly. Six months on, we have already received a very positive response from both the academic and the general community, and we hope that the value of the Corpus as a record of the current-day linguistic situation in Scotland will grow as it increases in size and flexibility, and as balance is improved.
This talk will outline the nature of the Corpus, and through examples, demonstrate the value of corpus methodology for linguistics. SCOTS at present is concentrating on texts in Scottish English and all varieties of Scots (e.g. Doric, Lallans, insular Scots, urban varieties…), of as full a range as possible of text types, including fiction, poetry, correspondence, conversation and interviews, journalism and official writing. The majority of the final Corpus will consist of written texts, but a sizeable proportion will be spoken language, made available as orthographic transcriptions synchronised with source audio or video files. All texts, both written and spoken, are accompanied by detailed metadata, which make the Corpus particularly amenable to sociolinguistic investigations.
The multimedia nature of the Corpus is one of its strengths, but also a real challenge for a web-delivered resource, especially one which targets both academic researchers and general users. Other issues which must be considered include non-standard spelling in Scots, text availability, and the provision of analysis tools over the Web. Even as we are still attempting to overcome these hurdles, however, there is evidence that the SCOTS Corpus is already a significant resource for linguistic and cultural study, and I shall demonstrate some of its possible uses here.

