(Updated in April 2022) The following is Part 1 of a two-part essay posted the blog of Mensa Canada in 2019.

Mensa is the largest and oldest high IQ society in the world, of which I have been a member since 2015. Many members of Mensa are interested in the topic of human intelligence, and I was invited to weigh in on this topic with a view to my own profession—first giving a talk to the Toronto chapter of Mensa back in 2018, which was later converted into this two-part essay published in 2019.

Three years and a pandemic later, it seems that many of my predictions are becoming reality at a speed we did not anticipate.

From Nuremberg to the neural network

When the idea of simultaneous interpretation – that is, to have someone listen to a speaker in a source language and orally translate that speech into another language in real time through microphones and headsets – was first proposed for the Nuremberg trial of twenty-two Nazi war criminals in 1945, the proposal was met with strong suspicion that this task would be humanly impossible. True, human interpretation (translating spoken language) had existed for centuries, but until 1945, it typically took the format of what professionals call consecutive interpretation – that is, where the interpreter has to wait until the speaker is finished before beginning to interpret what was said.

Part of why the idea was considered so audacious at the time was because it would employ technology in a way never seen before. With the help of IBM, a French-born American lieutenant colonel developed the microphones and headsets needed to transmit four different languages – English, French, German, and Russian – all in real time to the headphones worn by judges, defendants and others in the courtroom. The main advantage of using simultaneous (instead of consecutive) interpretation was that this new mode would allow for an expeditious trial; waiting for everything to be interpreted consecutively would have made every trial session three times longer. The lieutenant colonel also acted as a Chief Interpreter to hire and train the interpreters to perform this new mode of interpretation.

The experiment proved to be successful. Simultaneous interpretation later became the default mode of language interpretation at multinational organizations such as the United Nations and the European Union, as well as national bodies that work with more than one official languages, such as the Parliament of Canada.

Over seven decades later, the profession of simultaneous interpretation has come to its next crossroads. This time, the battlefield has shifted from Nuremberg to the neural network. A similarly bold idea is now being explored in the world of language interpretation – that is, to let Artificial Intelligence do the work for us. As was the case seven decades ago, this idea is met with mixed reactions. Can AI really do it? After all, the all-powerful AI can already perform seemingly formidable tasks such as beating the human world champion at the game of Go or driving a car from California to New York. Will language interpretation be an exception?

Before tackling the question of whether AI will succeed in replacing human interpreters, allow me to take a step back. I’ll provide some basics about how simultaneous interpreters manage to perform the task once thought to be “humanly impossible”, and what training is required.

Simultaneous interpreting takes more than just language fluency

Let me start with the common misconception that as long as someone is fluently bilingual, (s)he can simultaneously interpret between the two languages. Unlike consecutive interpretation, simultaneous interpretation requires the interpreter to be able to effectively split her/his attention between listening and speaking, something that can feel unnatural and distracting to say the least. Just because both of your hands are capable of throwing a ball into the air and then catching it doesn’t mean that you can juggle several balls between the two hands simultaneously. The performer must learn to coordinate, multitask, and above all, (s)he must put in an enormous amount of targeted practice to build the new neural pathways that allow her/him to juggle the balls effortlessly.

One exercise commonly used in training simultaneous interpreters is to “shadow” a speaker, that is, to repeat someone else’s words, first in the same language, while continuing to listen and follow along. In fact, most of us are able, to varying degrees, to do so if we follow the speaker closely enough. However, the task becomes more challenging as we prolong the “decalage” (time lag) between yourself and the original speaker. As the time lag gets close to five seconds, a certain amount of processing power and short-term memory will be required in order to repeat what was said five seconds ago. This calls for the interpreter’s effort to split her/his attention between listening and speaking. At some point, a second language will be introduced to this exercise, when the trainee stops simply “parroting” the original, but rather “interpreting” the message into another language.

Adding to the skill set is something called “salami”. If you are thinking of the Italian sausage, you are thinking right. Salami is the term interpreters use to mean cutting up a long sentence into several short ones, in order to be able to follow the speaker closely. To give you an overly simplistic example, let us imagine a hypothetical language that puts the verb after the object. Instead of saying “I love chocolate, cheese, tomato and pickle”, speakers of this language say “I chocolate cheese tomato and pickle love.” If you were interpreting from this language into English, you would not want to wait until the end of the sentence to begin saying something. This is where the salami technique comes in. Do we really need the verb (“like”) to formulate a sentence? The answer is no. You can say something like “What is my opinion about coffee and bagel? I love them!” This sentence, the outcome of salami, still makes perfect sense in English. What is different is that now the interpreter can follow the order of words as they came out in the original language, and deliver the meaning truthfully.

Typically, the training of a simultaneous interpreter requires one to two years of full-time training for already fluent bilinguals or multilinguals. Aside from techniques mentioned above, the trainee usually also spends time picking up vocabulary and background knowledge relevant to their work: history, economics, finance, political conflicts around the world, and how the United Nations and other international organizations work, etc. The training typically also includes elements such as note-taking in consecutive interpretation, as well as professionalism and ethics associated with being a professional interpreter.

Read the original post published on Mensa Canada’s blog in April 2019: https://mensa.ca/2019/04/09/simultaneous-interpretation-the-race-between-human-brain-and-artificial-intelligence-part-1/

Rony Gao is a member of Mensa Canada, a practicing conference interpreter and cross-cultural consultant based in Toronto. As a Chinese-English interpreter, Rony has worked for a wide array of political and business leaders.

Simultaneous Interpreting: The Race between Human Brain and Artificial Intelligence (Part 1)

From Nuremberg to the neural network

Simultaneous interpreting takes more than just language fluency

Simultaneous Interpreting: The Race between Human Brain and Artificial Intelligence (Part 2)

How I Became a Certified Translator in 10 Hours

Rony Gao