Research related to the TED Translators program

From TED Translators Wiki
Revision as of 17:29, 18 February 2015 by Symbolt (talk | contribs) (Papers)
Jump to: navigation, search

Below, you will find links to research related to TED's Open Translation Project, or to using TED Talks for translation and linguistics research.


Author(s): Hayeri, Navid
From the abstract: "(...) the study of influence of gender on translation of TED talks between English and Arabic. Differences were identified in language style between men and women in their English language TED talks, and these features were examined whether they were faithfully maintained in translations to Arabic."

Author(s): Amittai Axelrod, Xiaodong He, Li Deng, Alex Acero, and Mei-Yuh Hwang
Using TED Talks to develop adaptive machine translation.

Author(s): Welly Naptali, Tatsuya Kawahara
From the abstract: "Since 2010, International Workshop on Spoken Language Translation (IWSLT) has held an evaluation campaign on TED talks. In this paper, we describe our ASR system for TED talks in accordance with this campaign. The baseline system is trained on Broadcast News corpus. A lightly-supervised acoustic model training is introduced by retrieving a faithful transcript of TED speech from the corresponding subtitle. Three filtering methods are investigated to select the training data in this work. The resultant acoustic model is effective for improving ASR accuracy, combined with speaker normalization and adaptation techniques."

Author(s): Mauro Cettolo, Christian Girardi and Marcello Federico
From the abstract: "We describe here a Web inventory named WIT3 that offers access to a collection of transcribed and translated talks. The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website ( (...) Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages. This effort repurposes the original content in a way which is more convenient for machine translation researchers." See the WIT3 project at

Slides and conference posters

Author(s): Arianna Bisazza and Marcello Federico
Machine-translation using smart computational linguistics, based on data from English-Arabic translations of TED Talks.

Author(s): Laura Santini
An analysis of semantic trends in the English-Italian translation of TED Talks, based on 10 most-viewed talks. Notably, at 70% into the presentation, there is an analysis of the dominant stylistic features of each talk (e.g. expert style, colloquial style) and an exploration of how they "survived" translation (e.g. "strong trend to tame the tone").

Other projects

A big repository of OTP transcripts and translations converted into XML for research purposes ("WIT3 aims to support research on human language processing as well as the diffusion of TED Talks!"). They offer these xml files and a set of tools that facilitates using our transcripts and translations for research (machine translation, speech recognition etc.). See this readme about what the tools do.

A corpus of 10 TED Talk transcripts + translations in 43 languages, annotated for syntactic structure, parts of speech etc. using the Penn Treebank format.

Info on a workshop on OTP translation conducted by Albanian translator Elvira Peço in March 2014 at the Association of Interpreters, Translators and Translation Studies Researchers.