A discourse upon method, historical knowledge and information technology

R. J. Morris, University of Edinburgh

The availability and use of digital resources for and by historians is having and will have a major impact on the practice of history. Digital resources come in three main forms: there are those prepared for specific research projects; there are those prepared as a general service, such as the 1801 Norwegian Census and the Scottish Statistical Account, as well as the ubiquitous OPACS, bibliographical and archive catalogue services; and there is the growing body of documentation that originated in digital form. As a result there are few areas of historiography which have not been influenced by the use of digital resources.

Why did computers have such an impact? They can handle large amounts of information and they can analyze this information in a highly systematic manner - at their best these machines are powerful pattern-seeking tools. This is both a strength and a weakness; computer users have had a major impact on the study of eighteenth- and nineteenth-century politics, especially on the importance of party, but if we look at the historiography of, say, the 1832 reform of parliament, then the computer has moved attention away from speeches, diaries, letters and newspaper reports towards the voting records, and especially the parliamentary poll books. Why these particular records? - because they have that regularity of structure which the machine loves. This is, however, a move that directs attention away from the intricacies of personality, alliance, interest and ideology and towards the behaviourist evidence of 'the vote' and associated party labels.

Two techniques have dominated computer use by historians in last thirty years. The first is list processing, utilizing, for example, census data, poll books, rate books, poor law, hospital and asylum registers. The regular structure of these records, together with the fact that returns can only be gained by the systematic processing of large amounts of information, makes them ideal for the machine. The second technique is record linkage, which usually involves linking records referring to named individuals in order to create new forms of information. The records may, of course, refer to a building, an object, a place, an area, but at its most powerful the technique reconstructs individual histories in large numbers. To take just one example, there is the work done by Stana Nenadic on business in the nineteenth century, a golden age of enterprise for family business we have often been told; linkage shows that such concerns were predominantly small, fragile and short lived and highly dependent on the contingencies of family life for survival.

How do you begin to criticize the authority of the computer and the huge quantity of information behind these claims? First, the machine is pattern-seeking and has a hunger for specific types of document - hence the privileging of the information in those documents. Second, the processing of such information can only proceed by adopting a series of rules. Some were applied directly by the machine, controlled by appropriate algorithms. In other instances the historian sits patiently in front of the screen annotating a data base line by line. Third, there are aspects of research design which can have hidden effects on results. For example, nominal record linkage itself can only proceed through a complex set of rules which often depend upon probabilistic judgements, judgements which it is usually impossible to quantify. How do you know that Bob Morris is the same as Professor Morris or even R. J. Morris? There are three individuals by the name of Professor R. Morris in Edinburgh University, and two called Professor Bob Morris. My personal bibliographic record used to be attributed to three people - R. J. Morris/Bob Morris/Professor Robert J. Morris (this has since been reduced to two thanks to the sharp eyes of a Library of Congress MARC record cataloguer). The researcher might look at department and subject area to solve the problem and be happy, but if the conclusions about professors indicate that they tend to stay in the same department whilst at Edinburgh, how should such conclusions be evaluated?

Record linkage has a bias in favour of stability. This has been intensified by the tendency of research design to be based upon area sampling - all middle class people from Leeds, the population of Philadelphia, etc. Now we know that geographical mobility and social mobility tend to go together. French Annales - style studies showed enormous stability, but the recent linkage of the 'TRA' sampling of the demographic records of the whole of France showed that the movers had much higher rates of social mobility. The peasant who stayed remained a peasant, whilst the one who moved became the craftsman, trader or labourer. Record linkage also has a bias in favour of individuals about whom we have more and better quality information, and in general this means higher status individuals - at a very simple level, higher status people have more complicated names.

One effect of all this (the quantity of information available and the pattern-seeking nature of information technology) is that historians need a greater awareness of social science methodological criticism and greater skills in numeracy and basic statistics.

What then of the future for historians? There are major gains to be made in text analysis - the application of authorship analysis techniques to the historical record will produce interesting results. The major barrier now is the historian's imagination and the need for very extensive electronic databases of text (the historian's use of text tends to be extensive rather than intensive as in most literary studies). How would political historians react if it was demonstrated that all the speeches in Hansard by Grey, Wellington, Brougham, Melbourne and the rest were written by the same person (not impossible given the primitive state of shorthand in the eighteen-twenties and thirties)?

In the short term, historians need to respond to a world in which an increasing number of documents will be available in electronic form. This poses a number of problems and challenges: opportunities for analysis and access; questions of quality control; opportunities for criticism and verification. Recent websites have provided greater quantities of visual and cartographic material, but historians have few general means of presenting and analyzing such material.

Finally, the historian of the recent past is entering a world in which data was created in, and preserved by way of, electronic means - often to be accessed only by a dated technology. When I read Gladstone's letters I touched the same bit of paper as Gladstone, got to know his handwriting and used the same technology as Gladstone. When Tony Wedgewood Benn deposited his diaries in the British Library, he handed them a set of disks, readable only by a now dated technology. For other historians, the relationship between document and data has been broken. Plan a nineteenth-century railway journey and you look for the solid bulk of a Bradshaw much as did the clerks and passengers of the period. Plan twenty-first-century air travel and your document exists for a fleeting moment on the travel agent's screen, created from a database of continually changing information. Indeed the historian of the recent past has problems in deciding what 'the document' actually is. The historian can collect the data of the family expenditure surveys as well as the general election surveys of the nineteen-seventies from Essex and conduct rapid analysis on a desk top, but should this not be done on the same machine and using the same technology as the historical actors whose actions we seek to understand?

Add to this the fact that the quantity of information created and preservable in enormous electronic warehouses is vast. No historian will be able to read all those dispatches, emails, drafts, etc. New strategies will be required to survive in such a world. One of these, perhaps, is that of 'noise', an idea derived from intelligence analysis - it is impossible to organize anything violent and important like a war or a coup without an increase in telephone calls, emails and so on, a rule bin Laden et al. got around by using old-fashioned foot soldiers.

Some of these strategies can already be used by any historian. When, for example, did the Catholic question create most interest in nineteenth-century Britain? Knowing that the RSLP/CURL project has just catalogued a wide range of pamphlets from the period I logged on to the combined online catalogue, asked for all titles which included the word 'Catholic', sorted them by date and came up with the following - lots of problems and a need for criticism of the result, but it is a start


















Bibliography

Examples of the impact of computing on historical understanding are everywhere. For a recent survey see, Information Technology and Scholarship: Applications in the Humanities and Social Sciences, ed. T. Coppock (British Academy, 1999). See especially Jean-Philippe Genet on cultural history and R. J. Morris on the impact of computing on the historiography of the 1832 Parliamentary Reform Act.

For some specific examples, see:

R. J. Morris, 'Family strategies and the built environment of Leeds', Northern History, xxxvii (2000), 193-214.

S. Nenadic and others, 'Record linkage and the small family firm: Edinburgh, 1861-91', Bulletin of the John Rylands Library, lxxiv (1992), 169-95 (this issue included a wide variety of articles on computer-based historical enquiry).

J. A. Phillips and C. Wetherell, 'Parliamentary parties and municipal politics: 1835 and the party system', Parliamentary History, xiii (1994), 48-85 (again one of several articles based on computer methodologies).

A different sort of problem is touched upon by History and Electronic Artefacts, ed. E. Higgs (Oxford, 1998), which looks at questions raised by the fact that a wide range of potential sources for historians are being and have been created in electronic form. The articles by Zeig (a diplomatic historian) and Higgs (formerly of the PRO) are worth especial attention.

The journal History and Computing published by Edinburgh University Press covers a wide range of the issues debated here:

'Record linkage', ed. S. W. Baskerville, P. Hudson and R. J. Morris, History and Computing, special issue, iv (1992).

T. Coles, A. Alexander and G. Shaw, 'Following the script: optical character recognition technology and the British town and trade directory', History and Computing, ix (1997), 1-16.

J. Dupâquier and D. Kessler, La société française au XIXe siècle: tradition, transitions, transformation (Paris, 1992), pp. 19, 122.

R. Miller, 'Cross sectional and longitudinal analysis in historical geographical research - some methodological considerations', in Studier och handlingar rörande (Stockholms Historia, vi, Stockholm, 1989), 121-36.

R. J. Morris, 'Does nineteenth-century nominal record linkage have lessons for the machine readable century?', Journal of the Society of Archivists, vii (1985), 503-12.

J. Phillips and C. Wetherell, 'Parliamentary politics and municipal politics: 1835 and the party system', Parliamentary History, xiii (1994), 48-85.

S. Richardson, 'Letter cluster sampling and nominal record linkage', History and Computing, vi (1994), 168-76.

R. Van Horik, 'Recent progress in the automatic reading of printed historical documents', in 'Scanning and OCR', ed. P. K. Doorn and R. van Horik, History and Computing, special issue, v (1993), 68-73.

I. Winchester, 'The linkage of historical records by man and computer: techniques and problems', Journal of Interdisciplinary History, i (1970), 107-124.

Relevant web sites include

Institute of Historical Research, School of Advanced Study, University of London
http://www.history.ac.uk/

The Association for History and Computing
http://odur.let.rug.nl/ahc/

Statistical accounts of Scotland (both old and new)
http://edina.ac.uk/StatAcc/

The census of Norway from the year 1801, Jan Oldervoll, Department of History, University of Bergen
http://www.uib.no/hi/1801page.html

Two major photographic archives for Scotland

George Washington Wilson, Aberdeen
http://www.st-andrews.ac.uk/specialcollections/Projects/VisualEvidence/

William Valentine of Dundee
http://specialcollections.st-and.ac.uk/

The drawn evidence: Scotland's development through its architectural archives from industrialization to the millennium, 1780-2000
http://www.drawn-evidence.dundee.ac.uk/dundee_dr/index.jsp
(you will be asked to register but this can be done online and should only take a short time)

July 2003

    Examining the impact... | Digitisation | back to the top