Question 1: (Big?) Data and ML

The ever increasing volumes of data that are available to us as researchers are changing the way in which we engage in Modern Languages Research. What data tools and concepts are helpful to us, not just in an instrumental sense of how we undertake our research, but also in a more conceptual sense of how we understand what Modern Languages is? How might tools such as crowdsourcing help generate audience engagement in Modern Languages, and increase the public understanding of ML?

Kirsty Hooper


Claire Taylor

Niamh Thornton


One thought on “Question 1: (Big?) Data and ML

  1. The point about intercultural encounters is crucial and I couldn’t agree more that Modern Language researchers have a lot to offer here, in particular in overcoming the very strong Anglophone bias in much so-called ‘big data’ research’.

    Just one minor point: not all languages are about programming really – XML is much more about encoding, or annotating, a skill which is in many ways more familiar to the Modern Language researchers than the average computing scientist. There *are* other languages which allow us to programme XML (to transform the markup in to something else), but the act of encoding in XML itself is very much about human interpretation if performed manually (and often, even if done automatically), except that the human interpretation comes with computational tractability as added value – literally ‘markup’.

    Following up on your point about collaboration, I would emphasise the fact that Modern Language researchers also need to be active in the stage when “data is gathered and transformed”, because otherwise the risk is that you are left with ‘data’ without punctuation or connotation, as you so elegantly put it. As I will one day argue in more depth (in an article yet to be written), digital modelling is essentially an act of translation, and the negotiation between human and machine models needs humanists as much as it need ‘technologists’. That in a sense is what the digital humanities are all about, bridging the gap with an interdisciplinary (or transdisciplinary) focus, and the Modern Languages researcher is expertly placed to guide us through the cultural geographies of data.

    In thinking about this question, I ask myself to what extent data is currently being used at scale in Modern Languages research, and how this might change in the future. I suspect that the degree of change will vary wildly from one area of the field to another, which will principally be dictated by the nature of the sources/evidence used. Most areas of Modern Languages research currently use pre-digital era sources, and so are dependent on the degree (and quality, and openness) of digitisation, but researchers studying post-digital subjects may have access to large quantities of born digital content.

    In my response to Question 6, I have already made the point that ‘deep’ or ‘human/machine translated’ data may be as important as ‘big’ data and as Christine Borgman recently argued, we also need to address the challenges posed by ‘no data’ (Borgman, 2015). There are all kinds of questions about the difference between ‘content’ and ‘data’, of who has the power to ‘publish’ data, the concept of data ‘context’ and of the ensuing chains of transmission, which there isn’t space enough to go into here.

    The interesting challenge will be how we develop new modes of interpretation for data and how we combine those with established modes of analysis in the humanities. ‘Data modelling’, ‘data curation’, ‘data visualisation’ and ‘data journalism’ only partially describe what I’m getting at here. I firmly agree that we need new theories of data ‘interpretation’ or data ‘translation’, which specifically take in the Modern Languages research view.


    Borgman, C. L. (2015) Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Massachusetts: The MIT Press.


