Addressing context in machine translation development
Posted: Sat Feb 08, 2025 8:11 am
The second day of the conference began with a keynote titled “Machine Translation Using Context Information”, presented by Marcello Federico of AWS AI Labs.
Federico emphasizes that the output of machine translation may look correct out of context, but there are many external factors that may show them to be incorrect. These may include gender, speech registers, topic or domain of discourse, among other things that define the context in which the original text, and by extension the translation, operates within.
Machine translation has yet to solve these problems, however there is already a lot of research on how generic data can be annotated to control for some of these factors and analyze past translations to provide better output.
Context in machine translation for subtitles
In line with the theme of context, the next panel we were able to attend uruguay mobile database discussed a specific context-related problem in “Fixed Language Units and Machine Translation: Pragmatemes in Machine-Translated Subtitles” presented by Judyta Mężyk.
Mężyk defines pragmatemes as “autonomous, polylexical, semantically compositional utterances constrained in their signified by the situation of communication in which they are produced”.
If that sounds like a mouthful, what it basically means is text that communicates its full meaning only in the context it is part of. Examples include greetings such as “Hello!” or “Good morning!” and situational sentences like “How can I help you?” and “Sign here, please.”
Critical or uncritical? MT in the news
Newspapers and media are a major purveyor of mainstream views about current topics and trends, and as such make a good case for study. This is what researcher Elizabeth Marshman tackles in her panel “Weird, Wonderful, Worthy, and Worrying: Use Cases for MT as Described in Canadian Newspapers”.
Federico emphasizes that the output of machine translation may look correct out of context, but there are many external factors that may show them to be incorrect. These may include gender, speech registers, topic or domain of discourse, among other things that define the context in which the original text, and by extension the translation, operates within.
Machine translation has yet to solve these problems, however there is already a lot of research on how generic data can be annotated to control for some of these factors and analyze past translations to provide better output.
Context in machine translation for subtitles
In line with the theme of context, the next panel we were able to attend uruguay mobile database discussed a specific context-related problem in “Fixed Language Units and Machine Translation: Pragmatemes in Machine-Translated Subtitles” presented by Judyta Mężyk.
Mężyk defines pragmatemes as “autonomous, polylexical, semantically compositional utterances constrained in their signified by the situation of communication in which they are produced”.
If that sounds like a mouthful, what it basically means is text that communicates its full meaning only in the context it is part of. Examples include greetings such as “Hello!” or “Good morning!” and situational sentences like “How can I help you?” and “Sign here, please.”
Critical or uncritical? MT in the news
Newspapers and media are a major purveyor of mainstream views about current topics and trends, and as such make a good case for study. This is what researcher Elizabeth Marshman tackles in her panel “Weird, Wonderful, Worthy, and Worrying: Use Cases for MT as Described in Canadian Newspapers”.