Kirsty’s response to Question 1: Big (?) Data and ML

The world of data is, at first glance, an unfamiliar one for those of us who make our living from literary and cultural representations. We are trained – and we train our students – to ferret out nuance and connotation, to read between the lines or beyond the page, to find the multiple meanings surging around a simple word like ‘home’ or ‘nation’ or ‘language’. And Modern Linguists, like Ginger Rogers, do all this backwards and in high heels – or at least, in multiple linguistic, geographical and cultural contexts.

In the world of data, of course, our tried and tested strategies of interpretation do not wash. Trying to impute nuance, connotation and multiple meanings to a spreadsheet is a pointless task, rather as if your precious data is at the mercy of a translator who understands only one language and doesn’t get nuance. A computer will do exactly what you tell it to do, and only when you tell it using the one expression it has been programmed to understand (no stray punctuation and definitely no connotation).

But let’s not overestimate the problems. In fact, once you get past the initial encounter (awkward first data?) and see things from the computer’s point of view, much about working with data plays to our strengths as Modern Languages researchers. They are programming languages, after all, each with its associated social, cultural and pragmatic milieu. You could even say that Modern Linguist vs XML or SQL or [insert your programming language of choice] is the ultimate intercultural encounter.

In all seriousness, Modern Languages researchers not only have much to gain from data-driven humanities projects, but we also bring a very particular array of skills to the table. We are ideally placed to develop a reflective, intercultural approach to digital/digitized data and the tools that allow it to be captured, stored, curated, shared, analyzed and transformed. We need to make our case

Gathering data – qualitative, quantitative, numerical, categorical, bibliographical, biographical, topographical, you name it – is just the beginning of the process, and if we lack the technical tools to transform it into something else, well, that’s what collaboration is for (and that’s a Good Thing, by the way). But once the data is gathered and transformed, and ready for meaningful engagement, that’s when our expertise comes into play.

As Modern Languages researchers, we can combine our proficiency in representation, its nuances and connotations with our ability to consider the commonalities and differences of engagement with digital/digitized data and tools across cultures and languages. Out on the global web, data-driven projects and tools such as crowdsourcing, community archives, emotional geographies, or genealogical databases provide unprecedented opportunities to leverage the digital as a means of stimulating investment and even participation in Modern Languages research by individuals and communities who would never, even for a second, regard themselves as modern linguists. Let’s grab them!



Question 1: (Big?) Data and ML

The ever increasing volumes of data that are available to us as researchers are changing the way in which we engage in Modern Languages Research. What data tools and concepts are helpful to us, not just in an instrumental sense of how we undertake our research, but also in a more conceptual sense of how we understand what Modern Languages is? How might tools such as crowdsourcing help generate audience engagement in Modern Languages, and increase the public understanding of ML?

First contributor:

Kirsty Hooper


Claire Taylor

Niamh Thornton