Kay’s response to Q2: ML and digital archives

As a historian of Occupied France, much of my research life has been spent in archives where materials still appear from the stacks in grubby cardboard boxes, whose crammed contents have clearly not been read in years, if ever, if the rusty pins holding the fragile papers together are anything to go by. No part of my work is more exciting than when I engage with an original physical document, a moment when the years between us disappear and the war becomes alive. And I have lost count of the occasions when the delivery to my desk of the ‘wrong’ archival box has offered a serendipitous page to harvest.

But I’m no Luddite, just in case you think that’s where this is going. Indeed, the digital has positively revolutionized my research, in terms of how I work, what I produce, and the reach of the outcomes. Simply in practical terms, digitizing historical material offers a valuable backup system in case of the loss of the original. But there’s rather more to it than that for me. Working on wartime radio, as I do, the digitization of old wax cylinder recordings or gramophone records held at, for me, a core research archive—the Institut national de l’audiovisuel in Paris—has offered an alternative means of access to originals which previously existed only in delicate formats withheld from use. It has significantly expanded the corpus I can exploit and enabled me to write on resources never previously interrogated. Moreover, it has made it possible for me to hear the voice of the principal broadcaster I study, which is crucial for an analysis of his styles of delivery and the intended impact on his audience. But the opening up of wartime broadcasts in this way also suggested a further step to me: to use digital space to make the broadcasts widely available for both future research enquiry and interested general audiences by creating a new user-friendly historical resource, freely available as a public work. The result is a unique born-digital critical edition of wartime radio broadcasts which brings together a fragmented corpus—transcripts of the digitized recordings and digitized versions of the surviving printed texts of target broadcasts—published as a PDF file. It can be accessed here.

My edition is, in essence, a digitized version of the original materials which functions as a form of archive in itself. But I don’t personally think this makes me an archivist. The edition is its own document, and the content is filtered through the lens of my identity as a historian, not least because of the critical framework which accompanies the broadcasts. The edition is a hybrid which makes no claim to be a pure act of curation. The aural dimension has not been replicated, while those broadcasts which already existed in print version are not reproduced as facsimiles, but are new, clean versions created using OCR software. Nonetheless, best practice means that I have responsibilities to the original documents, and that my ‘version’ of these had to possess integrity if it were to be reliable. So, whilst I corrected basic inaccuracies (e.g. spelling or punctuation mistakes), or standardized presentation, the edition otherwise alters nothing of the original broadcasts, instead explaining any issues or inconsistencies in footnotes. Issues remain. Future-proofing is a particular concern, and the digital future has to ensure that the digital present remains functional, so that today’s PDFs do not become yesterday’s 78rpm records. Not that this is enough to dissuade me from my efforts: a second edition of wartime broadcasts is well underway.

3 thoughts on “Kay’s response to Q2: ML and digital archives

  1. I share your love of the physical objects (books, folios, journals, newspapers, clippings, etc) that you encounter in archives and the serendipitous finds that can arise, when you can get access. Access is not only a matter for things IRL it is also an issue online. It also occurs to me from reading the posts that it seems like a lot of words and concepts, like data in Kirsty’s post and archive, in yours, get stretched. Do these shifts count as a re-signification of the lexicon of research, and what challenges do they pose? Do they ask us to re-consider the disciplinary parameters?


  2. Another vote here for dust and serendipity! Another characteristic of the physical archive that never fails to make me think is its historicity – the sense that this box, that record card, those papers are the residue of many different decisions made by many individuals over many years. And I don’t just mean strategic decisions about bequests or donations, or affective decisions, such as the ‘epistemic anxieties’ Ann Stoler explores in Along the Archival Grain (1), but also the tiny, material decisions of archival practice – which brand of box, which size of record card, which pen or pencil or filing system. While we can’t access this historical residue in the same way when an archive is digitised (although that’s not to say it’s entirely lost), the digitisation process itself creates a whole new dimension of historicity, through the metadata generated every time somebody interacts with the digitised artefact. The formal, systematic and highly visible nature of metadata is quite unlike the dusty, often fragmented story of a material artefact’s creation, storage and use, but I wonder how useful it might be to consider the two in tandem, or, as Niamh says, to consider the elasticity of words and concepts like ‘archive’ or ‘data’ themselves. What can historians and archive users learn from metadata about decoding the historical residue of an object? What can users of digital archives learn by keeping in mind the stories generated by Carolyn Steedman’s ‘many dusts’ (2)? And how are these questions complicated when the artefacts and histories have crossed time and space, cultures and languages, to end up in our hands or on our screens?

    (1) Ann Laura Stoler. Along the Archival Grain: Epistemic Anxieties and Colonial Common Sense. Princeton/Oxford: Princeton UP, 2009.

    (2) Carolyn Steedman. Dust. Manchester: Manchester UP, 2001: 157.


  3. My response to Q2:

    It is over twenty-five years since Bernard Cerquiglini published his exuberant, In Praise of the Variant (1989), in which he urges us to fall in love with the variance that characterises the medieval literary text. Variance which a modern or uninitiated reader might dismiss as error, unnecessary background noise, confusion, in some way detracting from a hypothesised ‘original’. As a philologist by training, and a literary scholar by temperament, my natural instinct might have been to spurn the digital and its gaudy promises of new worlds in favour of the paper-sifting, archive-wading, parchment-venerating of my academic upbringing. However, like Cerquiglini and his vision of the new horizons that would be opened up by digital futures for text editing, I find myself a convert to the fascinating possibilities offered in that imagined future which is now present, and accessible to all.
    In my most recent work, I have been creating an iPad application which aims to make currently inaccessible manuscripts accessible, yet not simply to create an ebook or ‘do’ a digitisation of these materials. My app has at its core the tenth-century Exeter Book, and samples other manuscripts from Special Collections at Exeter, both English and French; it aims to bring these to a wider audience while also making it possible to use them as a scholar. At the same time, I am completing two ‘paper’ critical editions of fifteenth-century French debate poetry. In my mind these two projects are discrete. The ‘paper’ or the ‘online’ edition would seem, then, here to be separate entities for distinct audiences: each type of ‘edition’ with its own advantages and disadvantages. As a medievalist and codicologist, I cannot but value the physical book (I still can’t bring myself to use the Kindle kindly bought for me) and all its complex historicity: its users; its abusers; its scribes; copyists; editors; readers. However, the value I attach to these narratives of use, reuse and circulation, adhere equally in the digital archive. My dilemma, and that of the modern linguist with an interest in text editing, is where to go now? We’ve done paper editions, we’ve done online editions; scholars continue to produce both with little sign of one medium disappearing or being eclipsed by the other. Am I curating when I create my digital ‘edition’/archive, or am I recovering as much of the ‘original’ source material as possible? I think both. For me, any type of ‘textual’ recovery, as the late great Elspeth Kennedy would surely have said, is reception, is therefore the act of an editing and a curating hand and consciousness, whether that is digital, or on paper. The endless possibilities afforded by the digital edition have revolutionised the way we think about medieval texts and their multiple manifestations and variants. However, these vast and complex editions or digital archives risk alienating the ‘reader’ or ‘audience’. How can these be navigated effectively – how can we simply ‘read’ a text anymore, without simultaneously needing to be aware of its myriad copies, exemplars, editions, and now digital forms?

    Emma Cayley.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s