Below, you will find links to research related to TED's Open Translation Project, or to using TED Talks for translation and linguistics research.

Papers and articles

Does gender affect translation? : analysis of English talks translated to Arabic

Author(s): Hayeri, Navid
From the abstract: "(...) the study of influence of gender on translation of TED talks between English and Arabic. Differences were identified in language style between men and women in their English language TED talks, and these features were examined whether they were faithfully maintained in translations to Arabic."

New Methods and Evaluation Experiments on Translating TED Talks in the IWSLT Benchmark

Author(s): Amittai Axelrod, Xiaodong He, Li Deng, Alex Acero, and Mei-Yuh Hwang
Using TED Talks to develop adaptive machine translation.

Automatic Transcription of TED Talks

Author(s): Welly Naptali, Tatsuya Kawahara
From the abstract: "Since 2010, International Workshop on Spoken Language Translation (IWSLT) has held an evaluation campaign on TED talks. In this paper, we describe our ASR system for TED talks in accordance with this campaign. The baseline system is trained on Broadcast News corpus. A lightly-supervised acoustic model training is introduced by retrieving a faithful transcript of TED speech from the corresponding subtitle. Three filtering methods are investigated to select the training data in this work. The resultant acoustic model is effective for improving ASR accuracy, combined with speaker normalization and adaptation techniques."

WIT3: Web Inventory of Transcribed and Translated Talks

Author(s): Mauro Cettolo, Christian Girardi and Marcello Federico
From the abstract: "We describe here a Web inventory named WIT3 that offers access to a collection of transcribed and translated talks. The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). (...) Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages. This effort repurposes the original content in a way which is more convenient for machine translation researchers." See the WIT3 project at https://wit3.fbk.eu/.

Statistical Machine Translation of Subtitles: From OpenSubtitles to TED

Author(s): Mathias Müller, Martin Volk
From the abstract: "In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences."

The Audio Description of Scientific Multimedia

Author(s): Lidia Cámaraa and Eva Espasa
An exploration of how audio description could be implemented for TED Talks. From the abstract: "This article focuses on audio description of dynamic images in non-fiction scientific genres, including documentaries and multimedia presentations. It discusses current research on images, scientific translation and accessibility, analyzes existing audio-described documentaries, and proposes alternatives that can improve visual accessibility to multimedia scientific texts in different formats."

Slides and conference posters

Cutting the long tail: Hybrid language models for translation style adaptation

Author(s): Arianna Bisazza and Marcello Federico
Machine-translation using smart computational linguistics, based on data from English-Arabic translations of TED Talks.

Interlingual and intersemiotic translation in TED Talks

Author(s): Laura Santini
An analysis of semantic trends in the English-Italian translation of TED Talks, based on 10 most-viewed talks. Notably, at 70% into the presentation, there is an analysis of the dominant stylistic features of each talk (e.g. expert style, colloquial style) and an exploration of how they "survived" translation (e.g. "strong trend to tame the tone").

Other projects

WIT3: Web Inventory of Transcribed and Translated Talks

A big repository of OTP transcripts and translations converted into XML for research purposes ("WIT3 aims to support research on human language processing as well as the diffusion of TED Talks!"). They offer these xml files and a set of tools that facilitates using our transcripts and translations for research (machine translation, speech recognition etc.). See this readme about what the tools do. See OTP-related_research#Papers this section for a paper that contains a description of this project.

The NAIST-NTT Ted Talk Treebank

A corpus of 10 TED Talk transcripts + translations in 43 languages, annotated for syntactic structure, parts of speech etc. using the Penn Treebank format.

Workshop #2 — TED Open Translation Project: Translating TED Talks Into Albanian

Info on a workshop on OTP translation conducted by Albanian translator Elvira Peço in March 2014 at the Association of Interpreters, Translators and Translation Studies Researchers.

Research related to the TED Translators program

Papers and articles

Slides and conference posters

Other projects

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Languages

Tools