Text Mining the Old Bailey Proceedings

Professor Tim Hitchcock (Hertfordshire)
13 June 2011

 Abstract (taken from the History SPOT blog)

The Old Bailey Online is probably one of the most successful web-based projects produced in Britain thus far.  Based on the proceedings from London’s central criminal court this is a fully searchable edition containing some 197,745 criminal trails detailing the lives of non-elite people.  One of the originators of the project, Tim Hitchcock is looking at how to use text mining tools to examine the proceedings and discover new things about them.  Text mining is the derivation of meaningful data from a large body of unstructured data, using automated methods to reveal structure and associations.  Through text mining Hitchcock is able to compare patterns of persecution over time and further examine changes in court behaviour and procedure. 

Did you know, for instance, that the shortest trial on the Old Bailey proceedings is just eight words in length whilst the longest is 320 pages and over 150,000 words?  Hitchcock believes  that previous attempts to average trial lengths per year to show trends disguises the mix of long and short trials contained within each year and also the fact that the accounts are not entirely complete, that some trials are purposefully reduced in length for very interesting reasons.  Through text mining Hitchcock shows that changes in the nature of the jury trial (and which trials would reach a jury) are vital to understanding the trends especially when looking also as the number of non-guilty verses guilty pleas and verdicts.  Hitchcock argues that plea-bargaining became increasingly important. 

At the heart of Hitchcock’s paper is an argument that data/text mining represents the beginning of a new methodology for historians studying data and that we are very much at the beginning of an exciting process of using digital tools for new historical research.  All we have to do is rise to the challenge.



Professor Tim Hitchcock profile

This session was streamed live via livestream on 14 June 2011  



(This video is an edited version from that stream.  For the original unedited version see our Past live stream pages) 


Audio Podcasts

(This video is an edited version from that stream.  For the original unedited version see our Past live stream pages)



Audio Podcasts

Geographical area: