Beavan, D., ‘Scottish Corpus of Texts and Speech’. Poster presented at Digital Resources for the Humanities 2002, University of Edinburgh, 8-11 September 2002

Recent years have brought significant changes to the political situation in Scotland. This new political situation has been accompanied by a resurgence of interest in the languages and culture of Scotland. The present-day linguistic situation in Scotland is complex, with speakers of Scottish English, Scots, Gaelic and numerous community languages making up Scottish society. However, surprisingly little reliable information is available on a variety of language issues such as the survival of Scots, the distinguishing characteristics of Scottish English, or the use of non-indigenous languages such as Chinese and Urdu. This lack of information presents significant problems for those working in education and elsewhere.

At present there is no electronic archive specifically dedicated to the languages of Scotland. Such a resource would provide valuable material not only for language researchers, but also for those working in education, government, the creative arts, media and tourism, who have a more general interest in Scottish culture and identity. It would provide important data about English as used in Scotland, and also Scots, in its many varieties, Gaelic, and the principal community languages. It is against this background that plans for the Scottish Corpus of Texts and Speech (SCOTS) project have been developed, and work is now underway.

The SCOTS project is the first large-scale project of its kind for Scotland. It aims to build a large electronic collection of both written and spoken texts for the languages of Scotland. This is a resource which is urgently needed if we are to address the gap which presently exists in our knowledge of Scotland's languages. Initially, the focus will be primarily on the collection of Scottish English and Scots texts, but it is also planned to include Gaelic and material from non-indigenous community languages such as Punjabi, Urdu and Chinese. Thus the Scottish Corpus of Texts and Speech aims to give a full and accurate picture of the complex linguistic situation which exists in Scotland today.

Once the texts have been collected, they will be gathered together to form a large electronic archive. SCOTS will be a publicly available resource, mounted on the Internet. It is envisaged that SCOTS will allow those interested in Scotland's linguistic diversity, and in Scottish culture and identity, to investigate the languages of Scotland in new ways. It will also preserve information on these languages for future generations.

The project is being carried out at two sites. The University of Glasgow is responsible for the collection of texts and speech and the creation and maintenance of the corpus. The University of Edinburgh will develop the corpus architecture and examine various research issues in the representation of multi-modal corpora.

We wish to attract attention to the project’s goals and technologies and give the opportunity for discussion with interested delegates.

A live, interactive demonstration of the project’s website will allow visitors a taster of the project’s goals, including the opportunity to explore a searchable database of Scots texts, retrieve full texts, browse metadata and view multimedia items etc.

Further information will be available in the form of brochures and response forms for those who may have texts they are willing to contribute and an opportunity to request updates and additional information about the project’s progress.

Over the three days some key project staff (both linguistic and technical) will be manning the demonstration and will be on hand to discuss the various elements that are brought together to make this project, some of which are:

The project is in its infancy and we welcome the opportunity to openly discuss decisions and issues and learn from the experience of attendees.