Kay’s response to Q2: ML and digital archives

As a historian of Occupied France, much of my research life has been spent in archives where materials still appear from the stacks in grubby cardboard boxes, whose crammed contents have clearly not been read in years, if ever, if the rusty pins holding the fragile papers together are anything to go by. No part of my work is more exciting than when I engage with an original physical document, a moment when the years between us disappear and the war becomes alive. And I have lost count of the occasions when the delivery to my desk of the ‘wrong’ archival box has offered a serendipitous page to harvest.

But I’m no Luddite, just in case you think that’s where this is going. Indeed, the digital has positively revolutionized my research, in terms of how I work, what I produce, and the reach of the outcomes. Simply in practical terms, digitizing historical material offers a valuable backup system in case of the loss of the original. But there’s rather more to it than that for me. Working on wartime radio, as I do, the digitization of old wax cylinder recordings or gramophone records held at, for me, a core research archive—the Institut national de l’audiovisuel in Paris—has offered an alternative means of access to originals which previously existed only in delicate formats withheld from use. It has significantly expanded the corpus I can exploit and enabled me to write on resources never previously interrogated. Moreover, it has made it possible for me to hear the voice of the principal broadcaster I study, which is crucial for an analysis of his styles of delivery and the intended impact on his audience. But the opening up of wartime broadcasts in this way also suggested a further step to me: to use digital space to make the broadcasts widely available for both future research enquiry and interested general audiences by creating a new user-friendly historical resource, freely available as a public work. The result is a unique born-digital critical edition of wartime radio broadcasts which brings together a fragmented corpus—transcripts of the digitized recordings and digitized versions of the surviving printed texts of target broadcasts—published as a PDF file. It can be accessed here.

My edition is, in essence, a digitized version of the original materials which functions as a form of archive in itself. But I don’t personally think this makes me an archivist. The edition is its own document, and the content is filtered through the lens of my identity as a historian, not least because of the critical framework which accompanies the broadcasts. The edition is a hybrid which makes no claim to be a pure act of curation. The aural dimension has not been replicated, while those broadcasts which already existed in print version are not reproduced as facsimiles, but are new, clean versions created using OCR software. Nonetheless, best practice means that I have responsibilities to the original documents, and that my ‘version’ of these had to possess integrity if it were to be reliable. So, whilst I corrected basic inaccuracies (e.g. spelling or punctuation mistakes), or standardized presentation, the edition otherwise alters nothing of the original broadcasts, instead explaining any issues or inconsistencies in footnotes. Issues remain. Future-proofing is a particular concern, and the digital future has to ensure that the digital present remains functional, so that today’s PDFs do not become yesterday’s 78rpm records. Not that this is enough to dissuade me from my efforts: a second edition of wartime broadcasts is well underway.

Technology has allowed us to gather material and share it to a wider community. What can we use to make archiving possible and lasting? If we work online and create spaces, do we become archivists? If so, what are the ethics and issues that arise from this? Is the digital archive an act or recovery or curation?

