The Old Bailey online: www.oldbaileyonline.org

Tim Hitchcock, University of Hertfordshire

The Old Bailey Sessions Proceedings are the very stuff of social history. Published eight times a year, they give detailed accounts of every felony trial conducted for the City of London and Middlesex between 1674 and 1834. In total they amount to some twenty-five million words, recounting 100,000 trials. They give details of what was certainly the most important event in most of 100,000 lives, and recount the stress and pain suffered when a broader population confronted violence and theft, murder and assault. It is not hyperbole to say that all of human life is here. The Old Bailey Online currently gives anyone with an Internet connection free access to eleven million words of this text. The site attracts between 1,000 and 2,000 visitors a day from around the world, and the full twenty-five million words of text will be available online, free and in its entirety by this time next year.

The site makes available both a full transcript of the Proceedings and an image of the original from which the transcript was taken. We have reached this point by using a microfilm as the basis for the creation of a comprehensive set of scanned images. These have then been double-entry re-keyed, and the resulting text has been marked up in XML, recording name, place, crime and verdict, among some twenty other items of information. The resulting text has then been made available to two different types of search engine. The first is called eXist, and is a dedicated XML search engine that works on an index of every word in the transcription, thus ensuring that keyword and text-string searches can be applied. The second is a more traditional search facility call MySQL, which treats the marked-up elements of the text as entries in a database, allowing this information to be analyzed and represented graphically, in pie and bar charts, and statistically in a tabular form.

The site is fast, and it works. It both changes the way in which we can use this particularly rich historical document, and at the same time preserves the wealth of information hidden in every page of the original. It avoids the pitfalls of both databases and of keyword indexing. It also provides the basis for the creation of a truly transparent search facility that will allow historians to link together, and interrogate imaginatively, the huge variety of records for London in this uniquely important century and a half.

The Old Bailey Online is different from most other historical digitization projects in two ways. First, unlike most projects, which actively select what to digitize from a broader archive, we are absolutely committed to posting a comprehensive transcription of a full and fully coherent source. We believe this is good historical practice that will give our website greater historical relevance for a longer period of time than could otherwise be achieved. And second, our commitment to comprehensive transcription in combination with our use of XML mark-up allows us to create the foundation for transparent cross-source searching of a sort that other technologies and systems of selection militate against.

One of the greatest frustrations of any historian must be the certain knowledge that every note they take, every page they turn, is as much a lost opportunity as it is an addition to knowledge. There is a sense in which every time we look at a single piece of text we are looking at it through the lens of a slightly different question. The problem that computing has always presented, is that up until now it has demanded that we specify the question in advance. You have to design a database before you can start filling it in.

If the profession's experience of historical computing over the last twenty-five years teaches us anything, it must be that attempts to encode and select information, to translate a historical document into something that a computer readily recognizes, are doomed to relative failure. The questions that are embedded in every code, the selections that are forced on every database designer, have a shelf-life shorter than a tin of baked beans. In ten years time young historians will be sitting around asking themselves why their elders bothered to encode that element of the text and not something else; why they selected one thing and not another.

The beauty of concentrating on the process of delivering a full transcript is that even ten years down the road we will still have something of value, we will still have something that can be searched by someone asking a question we cannot now conceive. The tagging involved in XML mark-up will age, and become increasingly less useful, but the text itself will remain. In a generation or two, the site will no doubt have the dated and musty smell of nineteenth-century antiquarianism, but like the collections that fill the shelves of the Selden End of Duke Humfrey's Library, or indeed of the corners of the Institute of Historical Research, it will nevertheless still be of use and used. In other words, the beauty of creating a comprehensive transcription is that it does not destroy the original text in the process of making a usable one.

But, our approach has a value beyond even this attempt to create what in computing terms can only be called a 'scholarly edition'. Using XML it is possible to create search facilities that allow historians to query a variety of types of sources at one time, to create links 'on the fly' that are controlled by the end user. The XML search engine we use, eXist, literally creates an index to every single word in the Proceedings. When, at some point in the future, historians develop a deep interest in toes or toast, tourniquet or teeth, they can locate them in the Proceedings. What this means is that it is now worthwhile to add other types of records to our trial accounts. We could, for instance, place the whole of the eighteenth-century literary canon on our site and allow historians to search the novels of Defoe at the same time as they search the Proceedings. Because the object of each search is that small quantum of information, perhaps a single letter, we would find connections between the two that could not be located in any other way. Imagine for a moment being able to specify that a single search located instances of the phrase 'Black Boy Alley' in the Proceedings, in novels and poetry, and in the manuscript archive of the City of London and its parishes. Or imagine searching for a name, of a house or a person, or for a type of shoe, or piece of furniture across these same sources. By largely eliminating the need to define what each part of a document means historians are empowered to bring together disparate bodies of text and to construct new queries as complex as their imagination allows. You build in a kind of flexibility that ensures the resources we are spending so much money to create will not look dated and useless in a few years time. To put it another way, you create a scholarly edition with a powerful index that will continue to be used for generations, as opposed to a monograph likely to be remaindered in a few months.

We are still faced with a series of difficulties and problems. At the moment we are struggling to integrate visual material, and in particular maps, into the website. When one moves beyond text the issue of keyword descriptors and modern interpretations becomes more pressing. We have also struggled with creating technical answers to issues around variant spelling. When 'Spitalfields' is spelt in eight different ways, it is difficult to design a search strategy that captures all of this variety.

The nature of 'mark-up' also creates a series of problems. Although more flexible than any other approach, 'mark-up' still forces us to define some text elements in advance. Just like a more traditional database, it imbeds a descriptive hierarchy into the text and forces us to ascribe to a collection of words and letters the quality of a 'trial', or of a 'surname', 'forename', 'crime' or 'punishment'. This is necessary in order to be able to count effectively different types of information and does not really impact on keyword and text-string searching. Nevertheless, it creates problems when searching across, and counting information from, different types of sources. Is a surname in a parish register, the same thing as a surname in a trial account? XML allows greater flexibility than could be achieved with a flat or relational database, but it nevertheless forces us to pre-define certain text elements.

For the Old Bailey, we are seeking to transcribe and post a full range of associated sources, and in due course hope to create transcriptions of the Ordinary's Accounts and the sessions papers, of prison records and those of transportation. In the longer term we want to include parochial records and letter books, justices' notebooks and property records, to create the tools that will allow historians to access the information that has always sat between and betwixt that wonderful variety of early modern records we possess. We want to give historians access to the information that so frequently gets misplaced on the gentle stroll from the Corporation of London Record Office to the Rare Books Room at the British Library. In the process we also want to create a different and more democratic kind of history. Besides satisfying the needs and demands of professional historians, we are also engaged in creating resources for family historians and local historians, undergraduates and school children. At the end of the day the Old Bailey Online is about making the lives of working people accessible to as many people as possible in as flexible and usable a way as feasible. It is a technical answer to the political and intellectual conundrum of history from below.

July 2003

    Examining the impact... | Digitisation | back to the top