Research related to the TED Translators program
Below, you will find links to research related to TED's Open Translation Project, or to using TED Talks for translation and linguistics research.
Papers and articles
Author(s): Hayeri, Navid
From the abstract: "(...) the study of influence of gender on translation of TED talks between English and Arabic. Differences were identified in language style between men and women in their English language TED talks, and these features were examined whether they were faithfully maintained in translations to Arabic."
Author(s): Amittai Axelrod, Xiaodong He, Li Deng, Alex Acero, and Mei-Yuh Hwang
Using TED Talks to develop adaptive machine translation.
Author(s): Welly Naptali, Tatsuya Kawahara
From the abstract: "Since 2010, International Workshop on Spoken Language Translation (IWSLT) has held an evaluation campaign on TED talks. In this paper, we describe our ASR system for TED talks in accordance with this campaign. The baseline system is trained on Broadcast News corpus. A lightly-supervised acoustic model training is introduced by retrieving a faithful transcript of TED speech from the corresponding subtitle. Three filtering methods are investigated to select the training data in this work. The resultant acoustic model is effective for improving ASR accuracy, combined with speaker normalization and adaptation techniques."
Author(s): Mauro Cettolo, Christian Girardi and Marcello Federico
From the abstract: "We describe here a Web inventory named WIT3 that offers access to a collection of transcribed and translated talks. The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). (...) Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages. This effort repurposes the original content in a way which is more convenient for machine translation researchers." See the WIT3 project at https://wit3.fbk.eu/.
Author(s): Mathias Müller, Martin Volk
From the abstract: "In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences."
Author(s): Lidia Cámaraa and Eva Espasa
An exploration of how audio description could be implemented for TED Talks. From the abstract: "This article focuses on audio description of dynamic images in non-fiction scientific genres, including documentaries and multimedia presentations. It discusses current research on images, scientific translation and accessibility, analyzes existing audio-described documentaries, and proposes alternatives that can improve visual accessibility to multimedia scientific texts in different formats."
Slides and conference posters
Author(s): Arianna Bisazza and Marcello Federico
Machine-translation using smart computational linguistics, based on data from English-Arabic translations of TED Talks.
Author(s): Laura Santini
An analysis of semantic trends in the English-Italian translation of TED Talks, based on 10 most-viewed talks. Notably, at 70% into the presentation, there is an analysis of the dominant stylistic features of each talk (e.g. expert style, colloquial style) and an exploration of how they "survived" translation (e.g. "strong trend to tame the tone").
A big repository of OTP transcripts and translations converted into XML for research purposes ("WIT3 aims to support research on human language processing as well as the diffusion of TED Talks!"). They offer these xml files and a set of tools that facilitates using our transcripts and translations for research (machine translation, speech recognition etc.). See this readme about what the tools do. See OTP-related_research#Papers this section for a paper that contains a description of this project.
A corpus of 10 TED Talk transcripts + translations in 43 languages, annotated for syntactic structure, parts of speech etc. using the Penn Treebank format.
Info on a workshop on OTP translation conducted by Albanian translator Elvira Peço in March 2014 at the Association of Interpreters, Translators and Translation Studies Researchers.