Digital research: processes, outputs and preservation
Michael Jubb, Arts and Humanities Research Board
We have heard today about a number of exciting projects and initiatives; and also about issues and problems that need to be addressed. These issues and problems are, of course, not unique to history. While there are some particularities that are specific to history, many of the issues are similar to those that are being addressed across a number of areas of the arts and humanities.
But I think that we need to recognize that we in the arts and humanities are somewhat behind the game as compared with our colleagues in other disciplines, even in the social sciences. For the AHRB, as we move towards the status of an AHRC, joining the family of research councils that are funded by the Office of Science and Technology, the key issue is to define what our role is, and should be, in taking these matters forward; and to determine what our priorities should be.
Let me start from what we have been doing up to now. When the AHRB was set up in 1998, we made a strategic decision to start by running schemes of awards in what our colleagues in the research councils call responsive mode. This means that we made no attempt to determine in advance either thematic or any other kind of subject- or topic-driven priorities. Judgements as to how we should allocate awards in each of our schemes are made solely on the grounds of the quality of the research proposals submitted to us, as assessed by peer review panels (who, of course, have to make decisions about priorities). Those panels are made up of leading scholars and researchers in each of the broad areas of study that fall within the domain of the arts and humanities.
In each of our schemes of awards, especially in the research grants and resource enhancement schemes, we have funded a good deal of work driven by, or making use of, ICT. Such projects have been selected through the normal processes of peer review, with the addition of a technical assessment by the Arts and Humanities Data Service (AHDS). Examples in the research grants scheme include studies of landholding in Ireland in the thirteenth and fourteenth centuries; of households, families and houses in early modern London; and of the Louisiana sugar industry. Examples in the resource enhancement scheme (which was established explicitly to fund work to develop the intellectual infrastructure of the arts and humanities) include the digitization of Renaissance festival books in the British Library; a database of the Atlantic slave trade; and the Royal Historical Society's bibliography of British and Irish history. Preservation of the digital resources that are created with AHRB funding is handled for us by the AHDS; and it is a condition of grant that projects submit digital resources for preservation by the AHDS, unless they have some other credible way of ensuring that those resources are preserved and made available to both the research community and wider audiences who might be interested in them.
In reflecting on the work that the AHRB has funded in this way, there are four key points I should like to make. First, as we have heard today, the work funded through these two schemes is, as least for the arts and humanities, large-scale, and it is collaborative. This is, of course, precisely the kind of work that the AHRB was established to fund. For large-scale collaborative projects, unlike the work that is carried out by individual researchers, cannot be funded out of the block grant that is allocated to HEIs by the Funding Council using the QR formula. Individual arts and humanities departments do not receive enough QR resource to fund work of this kind. But it is still the case that the majority of research in the arts and humanities is undertaken by individual researchers.
Second, the pursuit of large-scale collaborative work of this kind brings with it the need to think through in advance many of the issues of research methodology and of project management that in smaller-scale individual projects can be dealt with as you go along. This applies in particular to many of the technical issues of interoperability, of encoding or of the design of databases (or of data warehouses) that have been discussed today. And that gives rise to issues of awareness-raising and training. For there are as yet relatively few leading researchers in the arts and humanities who are skilled in the development, management and leadership of projects of this kind (and I am using those words deliberately, rather than the language of technical skills - as it was put to me the other day, the most effective drivers of Grand Prix cars do not pretend to be engineers, but they do know enough about the technicalities of their very sophisticated machines to be able to analyse their performance and to conduct a dialogue with the engineers about ways in which it might be improved). But whatever kinds of language we use about skills or competences, the central point is that research as a process is becoming, and must become, more professionalized.
Third, the scale of the resource, both human and monetary, that is invested in projects of this kind brings with it an imperative not just to deliver (although I must confess to a loathing of the word 'deliverable'), but also to ensure that the results of the work are effectively disseminated and made available not just to the research community but to a wide variety of other audiences. It is also essential to ensure that they are effectively preserved for re-use and for future generations of researchers and others, which is one of the key reasons why we have a strategic relationship with the AHDS, since it provides such a service to us and to the research community.
Fourth, we have seen to date very little in the way of projects that are making use of digital resources in innovative ways, or creating out of them research results in the form of new knowledge and understanding. A good deal of money, from the AHRB and from other sources, has gone into the creation of digital resources. But it is not yet clear how large is the payback in terms of new answers to new (or even to old) research questions.
Let me now sharpen the focus on to the AHRB itself. For it seems to me that there arise also from the portfolio of what we have supported and funded to date some strategic issues for us to consider. The first arises from the point that I made earlier about the overall lack of awareness and expertise in the use of ICT across the arts and humanities community. Today's audience is an expert one; but such expertise is relatively rarely distributed across our community. So there is an issue for the AHRB relating to the extent to which we should become involved in (which would mean putting resource into) training and awareness-raising; or whether we should regard that as primarily the task of HEIs. Here I should note that HEIs collectively have more funds allocated in QR for their arts and humanities departments than we have at our disposal.
As second issue is the role of the AHRB with regard to digitization projects in particular, alongside other bodies that hold research resources that they wish to have digitized (or which historians and other researchers wish to have digitized), and alongside other funding agencies such as the New Opportunities Fund (NOF) and the Heritage Lottery Fund (HLF), or in the higher education sector the Joint Information Systems Committee (JISC) or the former Research Support Libraries Programme.
A third issue is the role for the future of the research resource enhancement scheme, and how we should, if at all, try to make it distinctive from schemes run by other agencies. We have insisted in that scheme on the active involvement of the scholarly research community in the development of the 'intellectual infrastructure' of the arts and humanities; but is that a sustainable notion? Related questions are how we support or create effective partnerships between historians (and researchers in other disciplines), archivists and librarians; and how, with very limited funds, we make judgements as to the relative priority to be given to different 'intellectual infrastructure' projects. How do we move beyond the rather haphazard approaches we have at present (both in the AHRB and, I would suggest, in other agencies) to establishing priorities?
A final issue for the AHRB to consider is how we might foster and facilitate the development of more effective relationships between the different communities that are involved and for which the AHRB has responsibility. These include the librarianship and information science community in general (remembering that the AHRB has responsibility for research in that area); the specialist humanities computing community (which is widely respected abroad, but the work of which is relatively little known across the arts and humanities research community); and the vast body of arts and humanities researchers. Effective partnership is essential if we are to ensure that the opportunities that are available to us in theory are to be realised in practice, and in ways that are rooted in the behaviours and expectations of the different users when they are presented with digital resources.
All of these issues and questions have funding implications attached to them. Money that is put into training is not available for other priorities; money that we put into cataloguing or digitization projects that could (perhaps should) have been supported by other bodies - including those that own the resources in question - means that we put even more pressure on success rates that are currently at about twenty per cent in schemes such as research grants and resource enhancement. So consultation with the community about these matters is of critical importance.
In an attempt to address some of these issues, we have established as our first strategic research initiative a programme to support the development of the use of ICT in research in the arts and humanities. This programme is supported with funds that we secured in the 2002 spending review, and it represents a significant step in the evolution of the AHRB. It represents a move beyond (but not away from) the responsive mode funding. This programme will operate in an inclusive way, to bring together key strands of the work that we are funding that involves the use of ICT. We have appointed a half-time director for the programme, David Robey, Professor of Italian at the University of Reading, who has extensive experience in the creation and handling of digital texts.
We have drawn up some very broad aims and objectives for the programme, and we are clear that one of David's first tasks will be to establish some priorities and to focus the work of the programme so that it achieves tangible but significant results within three years.
Aims
To encourage, support and enhance the use of ICT in the conduct of research in all areas of the arts and humanities, the development and use of digital research resources and tools, and the exploitation of ICT in disseminating and making available the results of research.
Objectives
- To promote and support research that will directly or indirectly enhance or extend the use of ICT in the processes of research in the arts and humanities
- To promote and support the creation and development of high quality digital resources and tools that will enhance the intellectual infrastructure for research and scholarship in the arts and humanities
- To promote and support the effective dissemination of and access to digital scholarly and research resources both to the research community and to the public at large
- To promote the development and use of effective practice and standards in the use of ICT in arts and humanities research and scholarship, so that the digital resources created can be preserved and made readily accessible and available to others
- To support the development of training and guidance materials in the use of ICT in research in the arts and humanities
- To work in partnership with other bodies to develop integrated and strategic approaches to support for the development of digital scholarly and research resources and tools in the arts and humanities, and to access to such resources and tools.
Much of the initial work will require consultation and the running of expert seminars, and they are now being put in place. It would be wrong to prejudge what the results of this initial work will be, but let me offer some current reflections on issues that will need to be addressed as we plot the way forward, in addition to those that I have already mentioned.
1. What role should the AHRB adopt in relation to - or what priority, including funding priority, should it give to - the fundamental work of creating and providing access to library, archival and other lists, indexes and catalogues? There are other players in this field, as I have already noted. The National Archives is doing a great deal, through its PROCAT system and other means. There are related archival initiatives, such as Access to Archives (A2A) and the M25 initiative; and there is the Licensed Internet Associateship Initiative to encourage publishers and online content providers to digitize records of interest both to academics and to amateur and leisure historians.
2. To what extent should we be looking beyond the interests and needs of professional historians and researchers to the needs and interests of teachers and learners not only in higher education, but also in schools and in further education? How can we, in particular, most effectively address the needs and interests of the general public, or at least that part of it that is interested in learning something about language and literature, thoughts and beliefs, the processes as well as the products of human creativity, and about the past?
3. What role, if any, should the AHRB take in the development of digital publishing, and in helping to develop models that assist in moving development forward. How can we help to clarify in the e-publishing world the overlapping roles - very different from their traditional roles in print publishing - of author, publisher and librarian? How do we establish effective mechanisms of peer review for e-publications? How do we ensure that e-publishing sites are designed in ways that maximize usability and accessibility? How do we deal with access controls and authentication issues? How do we deal with both the updating and the preservation of e-publications? In this world of digital publishing, can we move beyond the publication of sources and resources (the intellectual infrastructure) to the publication of research results in innovative ways?
4. Can we move beyond the rule-based, highly structured sets of resources and research questions typical of current work using ICT in arts and humanities to research questions and issues that are less structured, more flexible and more imaginative? How do we move more effectively to exploit the possibilities for sophisticated analysis of a range of digital texts and discourses, images both still and moving, and sounds?
5. Finally, an issue of great interest to me as a former historian and archivist. The development of e-science and grid technologies seems to me to have the power to transform the ways in which we do research. For those in the arts and humanities who are aware of developments in e-science, there is often the perception that it is about developing distributed facilities to crunch petabytes of data. But what e-science is really about is harnessing technology to provide access to large-scale distributed resources, to the applications necessary to analyse those resources, to knowledge ontologies that help researchers to understand the complex resources that were and are being created in different contexts and for different reasons, and to create global collaborations between researchers. Much of this seems to me of direct relevance to the arts and humanities community. Increasingly we do need access to distributed resources and applications, in the form of images and sounds as well as text. And to analyse these large sets of data, to grapple with the issues of dealing with fuzzy and incomplete data and to conduct our analyses across several domains, we need large-scale facilities. Visualization, the development of sophisticated analyses that bring together several different kinds and formats of data, and the construction, for example, of tools to analyse our research resources both temporally and spatially, and by genre, together demand leaps of the imagination, and the development of new ways of working. But we do need to seize these opportunities.
The AHRB and the arts and humanities community are currently far behind in the e-science game. An ESRC programme is now being developed, and I hope that some of the awareness-raising that will form an initial part of that programme will be made available to, and will be taken up by, the arts and humanities community. One of the elements in our submission to the 2004 spending review will be a bid for funds to enable us to develop an e-science programme for the arts and humanities. So one of the developments that I do hope that the AHRB will be able to support and promote over the next few years is a move beyond where we have reached in humanities computing, so that we move into not so much an e-science as an e-humanities world, where research is both more professionalized, and more exciting. Both processes and outputs will be changed; and there will be a lot more to preserve.

