Digital resources - challenging use or users' challenges
Matthew Woollard, History Data Service
This short paper addresses two themes which relate to the ways in which digital resources affect the ways in which historians practise their craft. The two strands consider the creation and the use of digital resources for doing history. Currently, these two strands are strongly connected, as most of these resources are created by historians or other scholars. But this situation is altering, with the increase in 'digital born' materials.
Before moving to these two topics, I want to suggest a very broad typology of digital resources which historians face in their work - necessary because different problems surround the creation and use of these types of resource, and also because their information content varies considerably. This typology, and the analysis which follows, are very much based on my personal experience in the field of historical computing and on the resources currently made available by the History Data Service (soon to be called AHDS History - we acquire, process, preserve and disseminate digital resources relating to the discipline of history for the FE/HE sectors).
Types of digital resource available
Very broadly these are:
- digital surrogates, that is, high quality digital reproductions of sources (for example, the British Library's Lindisfarne Gospels, or the Old Bailey Sessions Papers);(1)
- digital replication (again various levels), that is, transcribed, edited (and sometimes value-added);(2)
- digital born, which neatly divides into two: web pages with 'intellectual' content; and digital sources. Previously these were mainly statistical or 'highly structured' resources, that is, library/archive catalogues, but more recently other more qualitative sources fall into this category, for example, letters, emails etc.
These 'types' of digital resources each demand very different skills (and resources) for both their creation and their use and interpretation.
Let us first consider the creation of these materials. In the past, and to the present, the majority of digital resources which historians might use have been created by historians themselves. The History Data Service has almost 600 digital collections, almost all of which have been created by academics for their own research. Funding councils enforce the preservation of the electronic by-products of research. Despite guidelines published by the AHDS for the modelling of historical data, the design of collections often leaves something to be desired. (Of course, part of the HDS's function is to 'process' these data for preservation and dissemination, thus the quality of the versions to be disseminated is to the highest possible technical standard.) This is often caused by a lack of training in database creation and a lack of understanding of the principles of source-oriented data modelling, but is also results from the fact that most standard off-the-shelf software is not appropriate to the needs of historians.(3) If we want to create a database of, say, the most 'databased' historical source, the British census enumerators' books, we need to weigh up the value of two approaches - the source-oriented and the pragmatic research-based. In the first approach, we take all of the material in the original and replicate it as closely to the original as possible, allowing for serendipity in our own research, and importantly also allowing it to be re-used by other researchers who may want to study something different. In the research-based approach only that material which we really want (need) to use should be transcribed and digitised.(4)
A classic example of the latter approach can be seen in the process of transcribing and making machine-readable the whole of the 1881 census of Great Britain. Here the Genealogical Society of Utah, despite expert advice, omitted (inconsistently, to be fair) the variable on disability. The challenge to the user here is that the GSU's documentation suggests that it was transcribed accurately.(5) The nave user who takes the database at face value (and many scholars are more than willing to have an inordinate amount of faith in electronic material) will write his or her history of the blind (or deaf) in the nineteenth century with highly unreliable statistics - as, in this case, might the cautious user who consults the documentation. The first lesson is that historians must be trained to construct valid electronic reproductions of the source material which they want to analyse with the computer. This means that solutions other than the MS Office suite of tools may need to be considered. The second lesson here is that historians must be trained not to treat electronic resources created by other historians with the reverence which many already show, but with the proper caution of historians. There are probably few 'digital forgeries', but for each high quality dataset lodged at the HDS there are perhaps two which are more modest in terms of their design and their documentation. The historian who wishes to 're-use' what another historian thought would make a good digital resource must not only master the original source type, but also the methods and procedures followed by the creator of the dataset.
How can this situation be improved?
Obviously the first thing that would help is better training on all levels - training for historians who are going to create their own digital resources. The quality of many of the 'technical appendices' of applicants to the AHRB research/resource enhancement schemes demonstrate that good practice is filtering through, but there are many examples of not-so-good practice. The AHDS runs detailed workshops on methods of digitization, and these demonstrate what can be done.
Training is also needed in the bestuse of some of these sources. Consider a relatively simple (but large) dataset, dubbed the 404. The dataset contains the total number of baptisms, burials and marriages for 404 parishes in England and Wales, during the period 1538-1873.(6) These data formed the basis of Wrigley and Schofield's monumental Population History of England. They have subsequently been repackaged and used by hundreds of local historians studying on the University of Oxford online diploma in local history , but (probably) because they have been formatted in a way to assist the local historian, they have never been used by 'national' population historians to challenge (or to verify) the analysis carried out by Wrigley and Schofield.(7) The point here is simple: many historians do not have the skills necessary to undertake the simple reformatting of data like these, although they may be able to carry out the multivariate analyses necessary to get information from them. Practical training in data manipulation could be provided in advance of practical methodological training. More workshops, or publications which demonstrate how to use the various forms of resources which already exist, need to be planned. First and foremost, these should explain that the re-use of data created by other historians is not only possible but may be useful too. (There are some very good examples of teaching materials which perform this function.)
Digital 'replications' are one side of the coin. On the other are digital surrogates and 'digital born' material. The former are not so much of a problem as historians generally treat them with the same reverence as they do the paper version, although it is clear that many are not replacements for the original. Again, regarding the census enumerators' books, the National Archives' version of the CEBs for 1901 are black and white, whereas the colour of the inks might improve interpretation of them. Standards are evolving in this area, but better documentation and enhanced technical metadata should improve such matters, as they highlight the 'interpretation' brought to the original source by the creators. Historians also need to improve their digitization skills.(8) Digital materials must not be considered to be surrogates, unless, of course, they really are. The creation of every digital resource is always part of a process of compromise where something is omitted, something is edited, and in a way which is not always transparent to the user (and sometimes not to the creator either.)
Digital born material brings a further problem. We have all heard stories about those cash-rich research libraries in the United States being unable to spend their money on literary memorabilia because authors deliver manuscripts electronically to their publishers. The historical world is facing a similar crisis. Many routinely-generated sources in business and government are being 'deleted' as their current life ends. Some initiatives do exist for the preservation of these forms of data, but clearly not enough is being done to allow the historian 100 years from now to look back 100 years in the same detail as the present-day historian can. And while there is a strong argument for preserving (in its archival sense) paper documents in electronic form, there must be a stronger argument for preserving digital born material. Some are being preserved, most notably in the UK by NDAD (the National Digital Archive of Datasets) and the ESDS (Economic and Social Data Service), and the National Archives are taking a much broader approach to electronic records management, which should make more material available.(9)
The challenges for archivists in this area are well-known, but for the historian these are virtually uncharted waters. How many historians will understand how to navigate through and analyse the many millions of emails from the White House which are being preserved annually at the US National Archives and Records Administration? Technical solutions are now available, but are there any postgraduate courses which are training students in this area? Presumably, historians are dipping their toes into these electronic waters, but there are few initiatives to make such materials available and fewer still examining how they could be used. Unless historians are more involved in these processes, the need for what Bob Morris some years ago called the e-palaeographer will be essential.(10) This problem will be alleviated by the creation of good documentation and of good preservation metadata.
Another form of 'historical source' is the Internet. Undergraduate and postgraduate students across the UK are taught skills relating to the interpretation of source material, but as far as I can tell only a couple of degree courses consider the historical value of web content. Richard Evans discussed this issue when considering David Irving's personal website during the Irving v. Penguin libel trial.(11) Also of interest here is the Internet as an information resource. A quick glance through undergraduate course handbooks available on the web suggests that there is a little guidance here.(12) The 'Internet for Historians' online tutorial, which forms part of the RDN's Virtual Training Suite, contains considerable valuable advice about using the Internet, its first rule being: 'Be critical of everything you find on the Internet'.(13) It is not just the content of which you need to be critical. Many websites do not contain metadata which allow the correct citation of the site. How many historians would publish a book without a title page or author's name? Is it a question of honesty or ignorance which makes them exclude this from their web pages? This issue should be addressed at all levels, including those people who do not use digital resources but create them by proxy.
Summary
The first issue is that resource creators challenge their users, to varying degrees, because of their own inability to model data, to create resources and to document them adequately. There are few considerable problems but there are stumbling blocks, primarily the fact that historians (and others) make digital (re)sources often without the best or most relevant skills. Solutions include adequate and directed training: the AHDS workshops and guides to good practice are a start, but the data archives are often over-stretched and under-funded. Keeping up is difficult.
The second issue is that historians and others use digital resources which have been created for a variety of purposes (born digital, replication (various levels) or research by-product), but they do not know how use them properly. Sometimes their critical faculties get lost when examining them, although often because documentation is poor. The solution is to provide more direct training in how to manipulate data and how to interpret what is visible on the screen, but also to get historians to expect a high standard of metadata in all their different guises.
Notes
- The Lindisfarne Gospels (http://www.bl.uk/whatson/exhibitions/lindisfarne/home.html) (2 July 2003); The Proceedings of the Old Bailey London 1674 to 1834 (http://www.oldbaileyonline.org) (2 July 2003).Back to (1)
- Much of the collection of the History Data Service falls into this category (see, e.g., C. Harvey, E. Green and P. Corfield, Westminster Historical Database, 1749-1820: Voters' Social Structure and Electoral Behaviour (Database) (Distributor: UK Data Archive - SN 3908)).Back to (2)
- See M. Thaller, 'The need for a theory for historical computing', in History and Computing, ii, ed. P. Denley, S. Fogelvik and C. Harvey (1989); M. Woollard and P. Denley, Source-oriented Data Processing for Historians (St. Katharinen, 1993).Back to (3)
- See Databases in Historical Research: Theory, Methods and Applications, ed. C. Harvey and J. Press (1996). See also M. Woollard, 'Introduction: what is history and computing? An introduction to a problem', History and Computing, xi (1999), 1-8.Back to (4)
- N. Goose, 'Evaluating the quality of the 1881 census microdata sample', History and Computing (forthcoming, 2003).Back to (5)
- R. Schofield, Parish Register Aggregate Analyses (Colchester, 1998) with CD. Also available via the HDS as SN 4491 (http://www.data-archive.ac.uk/findingData/snDescription.asp?sn=4491). (Since this paper was delivered the ESRC has offered funding for the re-creation of these data into a more usable format for national historians.)Back to (6)
- E. A. Wrigley and R. S. Schofield, The Population History of England, 1541-1871: a Reconstruction (1981). For the University of Oxford advanced diploma in local history, see (http://diplocalhistory.conted.ox.ac.uk/) (3 July 2003).Back to (7)
- See, e.g., the case-studies in Making Information Available in Digital Format: Perspectives from Practitioners, ed. T. Coppock (Edinburgh, 1999); Information Technology and Scholarship: Applications in the Humanities and the Social Sciences, ed. T. Coppock (Oxford, 1999).Back to (8)
- For NDAD, see NDAD - UK National Digital Archive of Datasets (http://ndad.ulcc.ac.uk) (3 July 2003). The UK national initiatives are detailed at Public Record Office - Records Management - Electronic Records (http://www.pro.gov.uk/recordsmanagement/erecords/) (3 July 2003).Back to (9)
- R. J. Morris, 'Electronic documents and the history of the late twentieth century: black holes or warehouses', in History and Electronic Artefacts, ed. E. Higgs (Oxford, 1998), pp. 31-49.Back to (10)
- R. J. Evans, Telling Lies About Hitler (2002), pp. 233-4.Back to (11)
- An interesting guide is P. J. P. Goldberg, A Guide to Using Historical Resources on the Internet (http://www.york.ac.uk/teaching/history/pjpg/internet.htm) (3 July 2003); a curiosity is at http://www.uq.edu.au/hprc/outlines/hist1201.html) (3 July 2003). The latter states that 'you should be aware that surfing the internet in search of information is no substitute for spending time in the library. Trying to find material on the internet can often be frustrating, time-consuming, and unrewarding. If you do elect to use information from the internet, be sure to cite it correctly. You should provide the author's name, the title of the document or work, the URL in angled brackets, and the date if available. Note too that no more than ten percent of your cited sources should be internet-based'. Interestingly the author's name is not obvious, although probably Dr. Martin Crotty, and the page (according to the metadata in the html) is HIST1201, but is headed, 'School of History, Philosophy, Religion, and Classics Course Outline - 2003. HIST1201. Australian History' - a sure sign of confusion. See also M. Crouse, Citing Electronic Information in History Papers (http://cas.memphis.edu/~mcrouse/elcite.html) (3 July 2003).Back to (12)
- F.Condron and G. Cooper, Internet for Historians (http://www.vts.rdn.ac.uk/tutorial/history) Back to (13)
July 2003

