Difference between revisions of "How to Tackle a Transcript"

From TED Translators Wiki
Jump to: navigation, search
(Line breaks)
 
(120 intermediate revisions by 9 users not shown)
Line 1: Line 1:
==TED Transcripts==
+
<small><center>''Read this article in other languages:'' [[C%C3%B3mo_transcribir|Español]] • [[How_to_Tackle_a_transcript_ja|日本語]]• [[Comment_transcrire|French]]</center></small>
In addition to a TED-style description of the talk, most TED and TEDx talks come with a transcript. A TED/TEDx transcript is a form of same-language or "intralingual" subtitles. In addition to containing the words spoken by the speaker, the transcript must additionally be divided into subtitle lines and then spotted (cued, timed) to match the flow of the recorded talk. By convention, TED transcripts are different than closed captions (subtitles used for the deaf and hard-of-hearing) or same-language subtitles (subtitles for all types of viewers, e.g. used in noisy places like airports) in several ways:
+
----
 +
A TEDx transcript is a form of same-language subtitles or captions. In addition to containing the words spoken by the speaker, the transcript must additionally be divided into subtitle lines and then synchronized (timed) to match the flow of the recorded talk. Like closed captions, TEDx transcripts also contain sound information for Deaf and hard-of-hearing viewers. Below, you will find hints and strategies useful in creating TEDx transcripts as an OTP volunteer. If you haven't joined the OTP yet, go to [http://www.ted.com/transcribe TED.com/transcribe].
  
* Closed captions are based on the idea of compressing the spoken utterances and editing out redundant parts of the dialog (such as repetitions, embellishments, or references to content that can be identified visually on the screen), in order to make the subtitles easier to follow. Such editing is done to a much lesser degree in TED transcripts.
+
This guide is an extension of this [https://www.youtube.com/watch?v=ckm4n0BWggA&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC&index=6 video tutorial]. Note that the line-length and reading speed information below are guidelines for languages based on the Latin script; for other languages, the rules may be different. If you believe these rules are not suitable for your language, please contact us at [mailto:translate@ted.com translate@ted.com].
* One subtitle (text on the screen meant to represent spoken utterances) is normally composed of up to two lines of up to 35-40 characters each. There are no line breaks in TED transcripts, so one subtitle always consists of a single line only. There is also no official line length limitation in TED transcripts, although 75-80 characters seems to be a rule of thumb for maximum readable line length (although language-specific rules may differ). See sections on [[#Line duration | line duration]] and [[#Line length | line length]].
 
* A time-coded TED transcript will be extracted to form an interactive transcription of the talk, where each line serves as a link to the part of the video where the words were spoken. This does not usually happen with other types of subtitles.
 
* Like closed captions, TED Transcripts contain sound representations for deaf or hard-of-hearing viewers (e.g. (Laughter), (Applause)). Unlike in most subtitling conventions, these representations are written using capitalized words and phrases in parentheses, not capital letters.
 
  
==Resources==
+
'''IMPORTANT:''' before you start working on a transcript, make sure that the video is part of the TED team on Amara, using [http://ted-support.amara.org/support/solutions/articles/111906-is-this-talk-part-of-the-official-ted-team- this guide] (which also contains a link to a form you can use to add a video that is not on Amara). Otherwise, it may be impossible to publish your work on YouTube and make it available for translations. [https://www.youtube.com/watch?v=EtsbuZaiNqA&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC&index=3 This tutorial] shows how to properly search for talks available for transcription on Amara.
Some of the following resources are necessary to create a transcript, while others may simply make the process easier.
 
  
===Speaker's material===
+
=What are the benefits of getting your talks transcribed?=
Start by finding out the speaker's contact information from the TEDx organizer, or get the contact information for the person who was responsible for contacting the particular speaker before the conference. Be sure to ask the speaker for a copy of their presentation and any other material that they are willing to give you or link you to that may make preparing the transcript easier (you may be able to get the slides directly from the organizer, with the speaker's permission). This is especially useful when transcribing acronyms and proper names; a mispronounced proper name can be very difficult to find, but it will often be included on a slide. If the slides contain little or no text, ask the speaker if they have notes (in digital form) that they could provide you with. If you decide to email the speaker to ask for disambiguation, try to do so only after having created the whole transcript (marking out the uncertain parts with [Unclear]), and then put all your questions into a single email. Very often, something that seemed unintelligible while listening to the talk becomes obvious while working on the transcript, and this is why it is preferable to postpone asking the speaker for clarification.
+
Transcripts are important for several reasons:
  
===Video file===
+
*Same-language subtitles make the talk accessible to Deaf and hard-of-hearing viewers
Even if one intends to subtitle the video online, it is often useful to also have the talk in a video file on one's computer. Software players allow greater volume amplification than online applications, which can help in making out unclear words. If you have no access to the speaker's slides, a high-quality video can be zoomed in to reveal a proper name displayed on the screen. Most importantly, having a video file of the talk also makes it possible to use the offline subtitling software, and to work on the transcript with no Internet connection.
+
*Transcribed talks get indexed in Google, giving them and your event more exposure
A video file can usually be obtained from the organizer of the conference, or from the person responsible for filming the talks. A file-hosting site or ftp server can be used to exchange the files.
+
*Only talks with a transcript can later be translated (and possibly considered by TED for further distribution)
  
===Reviewer===
+
=The transcription project workflow=
A reviewer is as necessary in preparing a transcript as they are in translation. The reviewer should have the necessary skills and experience to make the transcript more accurate (correct mistakes), fine-tune the time coding and line breaks, correct the spelling and punctuation, and understand and improve the editing choices made in the transcript. Since creating and reviewing a TED/TEDx transcript requires a specific set of skills, it is best to begin searching for a reviewer even before starting on the transcript.
+
TEDx talk videos are uploaded to YouTube. Subtitles for those videos are created in an online tool created by our subtitling partner, Amara. In order to sign up for an account on Amara, and learn how to find videos to subtitle, watch these short [https://www.youtube.com/watch?v=Nua96nvklF4&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC OTP Learning Series tutorials].
  
==Subtitling tools==
+
Once a transcript has been completed, it must be reviewed by another volunteer and then approved by a [[Language_Coordinators|Language Coordinator]]. Approved transcripts can then be viewed while watching the TEDx talk on YouTube. The transcriber and reviewer are credited for their work on their TED.com profiles.
There are many free online and offline tools that can be used in transcribing talks. To use an online solution, one must first upload the video file onto a website (e.g. YouTube). Professional offline applications may be costly, but there are multiple freeware alternatives with approximately the same functionality. Subtitling tools differ in how accurately they allow the user to edit the cue times and how much assistance the subtitler receives in the spotting process. Every tool provides a different kind of interface (hotkeys, graphical cueing, keyboard and mouse combinations) and solutions meant to make subtitling an easier process. For example, [http://www.nikse.dk/SubtitleEdit Subtitle Edit] generates a graphical waveform representation that makes it possible to slide a subtitle over a spoken utterance without having to listen to it to ensure the subtitle is displayed in time with the speech. [http://www.universalsubtitles.org Amara] offers the auto-pause feature, where the video is paused while the user is typing, and a dynamic spotting method, where the user watches the video and presses a key to signify where a subtitle should be displayed. YouTube's "Automatic Timing" feature can also help you add time codes automatically (using speech recognition) based on a text file containing your transcription, which you can then edit using an offline tool to fine-tune the line duration and line breaks (see a tutorial [http://www.youtube.com/watch?v=B6jXPpqVPVI here]). Most offline tools can also be used to convert between many subtitle formats, in most cases by simply opening a subtitle file and saving it in the format of one's choice.
 
  
You can find links to various offline and online subtitling tools in the [[#Subtitling tools|External Links section]].
+
To get additional support, consider joining the [https://www.facebook.com/groups/43410681471/ general Facebook group] for Open Translation Project volunteers, and/or the local TED translator group for your specific language. You can find the list of language groups [[Language_Groups|here]].  
  
==Format==
+
'''HINT:''' If you're working on an English transcript, make sure to read our [[English_Style_Guide|English Style Guide]].
There are dozens of subtitle file standards in the world. In order to facilitate spreading your work in many websites that host subtitled videos, and to make it possible to switch back and forth between the online solution and an offline application, start by making sure the website you will eventually be using (possibly Amara and YouTube) supports the subtitle format that your offline application will generate. The time-based (as opposed to frame-based) .srt format is supported by most video hosting websites and offline players (including DVD players).
 
  
For languages that use characters not present in the subsection of the Latin alphabet used for English, the character encoding used in the subtitle file is also important. Most websites (including YouTube, Amara and Dotsub) support files encoded as UTF-8. If the .srt file is not encoded as UTF-8, some of the non-English characters may appear garbled. This can be solved by converting the subtitle file to the appropriate encoding standard. There are many freeware tools available for this. One of them is the [http://macchiato.com/unicode/convert.html online UTF converter]. The content of the .srt file needs to be copied and pasted into the form, and then copied and pasted back after conversion (.srt files are like text files with a different extension, and can be opened in any text editor for copying/pasting).
+
=Overview of the transcribing process=
 +
[[File:controls.png|300px|thumb|alt=Image shows the controls box in the Amara interface.|Users can review controls and guidelines right from the subtitling interface]] Transcribing an 18-minute talk usually takes between 4 to 6 hours; the user has 30 days to complete that task. Transcribing is divided into three steps:
  
==Cueing/timing and line breaks==
+
1. '''Writing down text and splitting it into subtitles'''<br />
Because a TED transcript is meant to work as subtitles, the content of the transcript must be broken up into subtitle lines, and these lines must be synchronized with the video. This process is referred to as cueing, spotting or timing. The main objective in timing the subtitles is to present the viewer with a line of text displayed on the screen for a period of time that will be sufficient for them to read and understand the text. On the other hand, the subtitles are only one part of the visual content that the viewer must take in at any given time, and for this reason, the subtitle line cannot be too long, because the viewer must be given enough time to look at and comprehend the video. Additionally, hearing viewers watching the talk with subtitles (e.g. translated into their language) must also have enough time to listen to the speaker's voice (the intonation and emotion in the voice / prosodic features) and other ambient sounds.
+
This step usually takes between 2-4 hours and involves typing down what the speaker says and dividing this text into subtitles that are in keeping with TED’s standards for length and are easy to read (e.g. don’t contain slips of the tongue, don’t merge two sentences together).
  
===Line duration===
+
2. '''Synchronizing the subtitles, editing the reading speed'''<br />
While subtitling in English, one must assume the average reading speed of 150-180 words per minute<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>, or up to 12 characters per second<ref>Gottlieb, Henrik. Routledge Encyclopedia of Translation Studies. [http://www.bookrags.com/tandf/subtitling-tf/ Subtitling]. Retrieved 2011-08-03.</ref>. A single-line subtitle of a TED transcript (equivalent to a two-line subtitle in usual subtitling) of any length should not stay on the screen for more than about 6 seconds. A subtitle cannot stay on the screen for less than approximately 1.12 seconds<ref>Williams, Gareth Ford, Ed. [http://www.bbc.co.uk/guidelines/futuremedia/accessibility/subtitling_guides/online_sub_editorial_guidelines_vs1_1.pdf BBC Online Subtitling Editorial Guidelines V1.1]. Retrieved 2011-08-03.</ref>, even if it only contains a single word, because subtitles with a shorter duration will just be a flash that most viewers will miss. Conversely, a short subtitle should not stay on the screen for too long, because that would prompt the viewer to re-read it<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>. Most free off-line subtitling applications calculate (or help calculate) the optimal duration of a subtitle based on the number of characters or words in the line. The duration should reflect the average reading speed, but also allow for a little more reading time for relatively "difficult" items that require more attention from the viewer, e.g. proper names or specialized terminology. Importantly, the reading speeds described above reflect values for English subtitles, and may vary for other languages.
+
This step usually takes up to one hour. The transcriber uses a simple interface to mark where the subtitles created in step one should display, and then fine-tunes the timing where necessary to improve synchronization and bring the reading speed down to TED’s standards.
  
===Line length===
+
3. '''Editing the title and description'''<br />
Unlike in TED transcripts, in most cases one subtitle consists of up to two lines at 35-40 characters per each.<ref>Chee Fun Fong, Gilbert. ''Dubbing and Subtitling in a World Context'', p.94. Chinese University Press, 2009.</ref> Since there are no line breaks within a subtitle in TED transcripts, it can be assumed that this means a single line in a TED transcript may consist of up to 70-80 characters (the closer to the 70-character mark, the easier to read the subtitle becomes). A longer line is difficult to read, and some offline players may automatically break it up to form three or more single lines, covering up to half of the screen. This character limit means that employing creative line-breaking to break the line before it passes the 70-80 character limit is a necessary part of transcribing a talk. Maximum line length in non-English subtitles may differ, especially for languages which do not employ the Latin alphabet.
+
Before submitting the subtitles, the transcriber needs to make sure the title and description of the talk are in the language of the talk and are formatted according to TED’s standards (learn more [[How_to_Tackle_a_Transcript#Title_and_description_format|here]]).
Most offline subtitling applications support line-length calculation and indicate lines that are too long, or too short for their on-screen duration.
 
  
===Synchronization===
+
To get a quick overview of working with subtitle lengths and reading speed, watch this short [https://www.youtube.com/watch?v=yvNQoD32Qqo&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC video tutorial], as well as [https://www.youtube.com/watch?v=ckm4n0BWggA&index=6&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC this tutorial] that contains a few useful tips for transcribing talks. Below, you will find more detailed advice covering each of the three transcribing steps, as well as some more technical information on formatting and timing the subtitles.
A subtitle line should not appear immediately when the speaker begins the utterance, but approximately 250 ms afterwards, in order to cue the viewer that something is being said and that they need to look for subtitles at the bottom of the screen<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>. The subtitle should not lag after the utterance for more than 2 seconds<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref><ref>Ofcom. [http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/standards_for_subtitling/subtitling_1.asp.html Guidance on Standards for Subtitling: General Requirements for Subtitle Display]. Retrieved 2011-08-03.</ref>, but usually such long lagging is not necessary. A break of approximately 250 ms should be inserted between consecutive subtitles whenever possible<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>, in order to cue the viewer that a new line is going to appear and make it easier to follow the flow of the text.
 
  
==Line breaks==
+
Below, you will find hints and strategies that you will find very useful when transcribing talks. For a quick introduction, watch this short [https://www.youtube.com/watch?v=yvNQoD32Qqo&list=PLuvL0OYxuPwxQbdq4W7TCQ7TBnW39cDRC video tutorial].
One sentence delivered in 30 seconds in the talk will often need to be divided into several subtitles. No matter how long the single subtitle may be (depending on on-screen duration and character-length considerations), the line breaks must not cut up syntactic (word-group) and semantic (meaningful) wholes.
 
  
===Examples of correct and incorrect line-breaking===
+
==Dividing the text into subtitles==
These examples show incorrect and correct line breaking for various line lengths. Line length is usually defined by the possible duration of the given line on the screen, which determines the number of characters that the line can contain based on the average reading speed. Unlike in the examples below, line length would normally be different for each subtitle.
+
This step usually takes between 2-4 hours. The user plays the talk and types down what the speaker says. In order to allow the viewer to read the subtitles easily, while typing down the transcript, the transcriber breaks subtitles longer than '''42 characters''' into two lines, and begins a new subtitle once a maximum of '''84 characters total''' have been reached (the subtitle can be shorter). This length information is displayed conveniently in the subtitling interface, for every subtitle. (Note: these values are applicable to all languages that use the Latin script. For length standards in other languages, consult resources in that language’s section of OTPedia or ask a [[Language_Coordinators|Language Coordinator]]).
  
'''Spoken sentence:'''
+
The main goal is to create subtitles that are easily read, well-rounded bits of text. This means that transcribers try to only split subtitles where it wouldn’t separate phrases and grammatical units (e.g. they don’t split an article and a noun at the end of a line or subtitle). To comply with TED’s length and line-breaking standards, a degree of rephrasing is permissible, as long as it doesn’t change the meaning of the sentence; slips of the tongue and obvious mistakes should not be included in the transcript.
<pre>
 
This is a very long, verbose piece of prose that no one knows and no one shall remember.
 
</pre>
 
  
'''Incorrect short line breaks:'''
+
[[File:transcribingstep.png|300px|thumb|alt=Image shows the transcribing step in the Amara interface.|The first step is about dividing text into subtitles]]
<pre>
 
This is a
 
very long, verbose
 
piece of
 
prose that
 
no one knows and
 
no one shall
 
remember.
 
</pre>
 
  
'''Correct short line breaks:'''
+
When deciding how to divide the text into subtitles, you should consider the following points:
<pre>
 
This is a very long,
 
verbose piece
 
of prose
 
that no one knows
 
and no one
 
shall remember.
 
</pre>
 
  
'''Incorrect medium line breaks:'''
+
1. '''Is the subtitle long enough to break it into two lines?'''<br />
<pre>
+
If the text you will have in the subtitle is over 42 characters in length, you should break it into a maximum of two different lines (two lines in the same subtitle). To break the line, hit Shift+Enter. You don’t need to break subtitles shorter than 42 characters; very short subtitles broken into two lines can be distracting to the viewer. '''IMPORTANT:''' The subtitle should never be longer than 84 characters total, and should contain no more than 2 lines.
This is a very long, verbose
 
piece of prose that no one
 
knows and no one shall remember.
 
</pre>
 
  
'''Correct medium line breaks:'''
+
2. '''Is the text that I'm entering too long to work as a single subtitle?'''<br />
<pre>
+
If the text you are entering is longer than 84 characters, you should create two subtitles instead.
This is a very long,
 
verbose piece of prose
 
that no one knows
 
and no one shall remember.
 
</pre>
 
  
'''Incorrect long line breaks:'''
+
3. '''Do the lines and the whole subtitle end neatly in "linguistic wholes"?'''<br />
<pre>
+
You should take care to break the lines and end the subtitles after linguistic wholes (e.g. don’t separate a possessive and a noun or somebody’s first and last name). Learn more [[How_to_break_lines|here]].
This is a very long, verbose piece of prose that
 
no one knows and no one shall remember.
 
</pre>
 
  
'''Correct long line breaks:'''
+
4. '''Am I including redundant text?'''<br />
<pre>
+
Broken phrases ("I wanted to--No, this is what I'll talk about"), repetitions ("Thank you, thank you, thank you, thank you") and empty syllables ("erm," "umm" etc.) should not be included in the transcript. Also, do not include obvious errors, like when the speaker says "We thinks" instead of "We think." Instead, use the correct form of the word in the subtitle. On rare occasions, if you believe that the need for the change is obvious (e.g. the speaker says “up” instead of “down”), but your edit will significantly alter the meaning of the sentence, put it in square brackets, to indicate intentional editing (e.g. “I woke up at 9 AM, and the sun was [up].”).  
This is a very long, verbose piece of prose
 
that no one knows and no one shall remember.
 
</pre>
 
  
===Simple rules-of-thumb for line-breaking===
+
5. '''Do I really have to cut the sentence up into this many subtitles?'''<br />
 +
As much as possible while respecting the length and reading speed standards, try to have the subtitle contain a “full” part of the sentence (a clause), or the whole sentence. This will make it easier to read, and it will be easier for translators later on to translate bigger chunks of one sentence than smaller ones, since not everything will divide up easily in the same way in the target language as it does in the original. To learn more about how to make your subtitles easier for future translators, see this [[English_Style_Guide#How_to_make_your_subtitles_a_good_source_for_translations|guide]].<br />
 +
'''IMPORTANT:''' Never include the end of one sentence and the beginning of another in the same subtitle (e.g. "this is why./And another idea").
 +
[[File:English-Cheat-Sheet.png|300px|thumb|right|Table with OTP subtitling standards|This '''printable''' cheat sheet contains all of the main OTP technical subtitling standards for Latin-script languages]]
  
* The articles (a, an, the) are never followed by a line break.
+
6. '''Did I include all of the sound information essential to understanding the talk?'''<br />
* An adjective should stay together with what it is describing, but two or more adjectives can sometimes be separated with commas, and then it is possible (though not preferable) to break a line after one of the commas.
+
Include all of the sound information essential to understanding the talk, such as non-verbal sounds that the speaker refers to (“(Clears throat) Sorry about that.”), off-screen speaker changes (indicate who is speaking, if that is not obviously visible), as well instances of music, clear laughter and applause from the audience (with the exception of intro music and applause heard at the beginning of the talk). Also, indicate any temporary change of language, and translate the subtitle into the main language of the talk (e.g. “(Arabic) This is my idea.”) Put the sound information in parentheses (e.g. (Music)), with the first letter capitalized, and always represent the sound, not the event that caused it (e.g. “(Gunshot),not "(Dog fires gun)."). '''For more information about using sound representation, read [[How_to_use_sound_representation|this guide]].'''
* Clauses should stay together (never break lines after relative pronouns like ''which'', ''that'', ''who'', etc.).
 
* Prepositions are not followed by a line break if the break would separate them from the noun they refer to. A preposition in a concrete/physical meaning  (e.g. "The book is in the drawer") always precedes a noun, and cannot be followed by a line break. However, a preposition that is part of a phrasal verb (put up, figure out, take in) may sometimes not be followed by a noun ("I figured it out yesterday"), and so, it can be followed by a line break.  
 
  
===Synchronizing line breaks===
+
7. '''Did I include on-screen text?'''<br />
If possible, the line breaks should be synchronized with pauses between (or within) the speaker's utterances, as this will make it feasible to use the standard 250 ms break between subtitles, and make it easier for the viewers to follow what is being said.
+
If possible without overlapping other subtitles and going over the subtitle length and reading speed limits, include on-screen text that is part of the talk (e.g. text on slides or embedded subtitles in a video played on the stage). This will allow this text to be translated into other languages. In order to signify that this is on-screen text and not something the speaker is saying, put the representation of on-screen text between square brackets.
  
====Synchronizing line breaks with long pauses====
+
Do not transcribe on-screen text which is not relevant to the content of the talk, nor text which will not be translated (e.g. the name of the TEDx event).
If the speaker's voice trails off, the subtitle can be displayed over (cover up) the pause, provided that it is possible to adhere to the character length and duration time limits. If this "stitch-up" subtitle would have to stay on the screen for too long, of if the subtitle line covering up the pause would need to exceed the character limit, the first part of the broken utterance (before the speaker's voice trails off) can end in the em dash (--) (TED transcripts do not use dots for this). If the following utterance (after the pause) can be considered as a new sentence, write the first word with a capital letter. If the following part of the utterance cannot be considered as the beginning of a new sentence, it is often necessary to insert a word in square brackets at the beginning of the line, in order to remind the viewer what the speaker talked about before the pause, e.g.:
 
  
* Starting as a new sentence after the pause
+
==Synchronizing the subtitles with the video==
 +
[[File:syncingstep.png|300px|thumb|alt=Image shows the synchronization step in the Amara interface.|The subtitles are synced using a simple videogame-like interface]] This step usually takes up to one hour. Starting with text neatly divided into subtitles, the transcriber now needs to tell the system when to show each of the subtitles while playing the video. The user plays the talk and hits the up arrow when the first subtitle should start displaying, and then hits the down arrow whenever the currently highlighted subtitle should stop displaying and the next one should start. Afterwards, they go back and make finer edits to the timing using sliders on the video timeline to set the beginning and end of subtitles (e.g. to fix a subtitle that starts displaying too long after a speaker started the equivalent sentence). For more information on using the Amara interface to sync subtitles, read this [http://support.amara.org/support/solutions/articles/194000-how-to-sync-subtitles article].
  
<pre>
+
Once the subtitles have been synchronized, the user goes back to implement reading speed fixes using sliders in the timeline. In order to allow the viewer to read the subtitle while it’s displayed on the screen, the reading speed for each subtitle must not be higher than '''21 characters per second'''. This speed information is displayed for every subtitle on Amara, and wherever this speed is exceeded, the transcriber can compress or reduce text (without changing the meaning) or/and extend the duration of the subtitle to fix the issue.
SPOKEN:
 
  
And there are many things that I like a lot, my books, my iPad...
+
'''HINT:''' A red exclamation mark is displayed on every subtitle that needs fixing for length or reading speed.
(3 seconds of applause)
 
...my bicycle, my cats and my hat collection.
 
  
TRANSCRIPT:
+
When synchronizing your subtitles, consider the following points:
And there are many things that I like a lot, my books, my iPad--
 
(Applause)
 
My bicycle, my cats and my hat collection.
 
</pre>
 
  
* Reminding the viewer what was said before the pause
+
1. '''Is the reading speed no more than 21 characters/second?'''<br />
 +
The maximum reading speed for subtitles is 21 characters/second. To maintain a good reading speed, you can extend the duration of the subtitle, even if it’s going to run a little into the time the next sentence is spoken (but don't start the subtitle more than about 100 ms ''before'' the equivalent bit of speech is heard).
  
<pre>
+
Extending the duration usually helps, but if necessary for a good reading speed, combine this with rephrasing the text of the subtitle to shorten/compress it while preserving the meaning. Remember that with a reading speed that is too high, the subtitle will just disappear too quickly for most viewers to read, which is tantamount to cutting it out of the transcript. For this reason, it’s always better to compress the text a little rather than create a verbatim transcript that viewers won’t be able to follow. Good reading speed is also very important because your transcript will often serve as the starting point for translations, and the equivalent subtitle may become much longer in the target language, raising the reading speed even more.
SPOKEN:
 
  
My grandmother liked many things, she read a lot, played games on her iPad...
+
For more advice on compressing/reducing text in subtitles, see [[How_to_Compress_Subtitles|this guide]].
(3 seconds of applause)
 
...rode her bicycle, talked to her cats and bought new hats for her collection.
 
  
TRANSCRIPT:
+
'''HINT:''' Occasionally, if the subtitle contains potentially difficult vocabulary (scientific terminology, obscure proper names), consider lowering the reading speed to values even below 21 characters/second, to make it easier for the viewer to take in the content of the subtitle and allow more reading speed for future translations (which are often longer than the original subtitle).
  
My grandmother liked many things, she read a lot, played games on her iPad--
+
2. '''Is the subtitle synchronized with the equivalent bit of speech?'''<br />
(Applause)
+
Generally, the subtitle should start displaying when the speaker says the equivalent bit of speech. However, good reading speeds are more important than perfect synchronization. If you need to extend the duration of the previous subtitle to get a good reading speed, it’s OK to have the next one start some time after the speaker said those words. However, don’t have the subtitle start displaying ''before'' the speaker says the equivalent sentence, since the mismatch in body language and on-screen content can be distracting to the viewer. This is especially important in cases where synchronizing changes in the video with changes in the subtitles is crucial to what happens in the talk (e.g. if possible, a subtitle that reveals what's in a slide should not show up before the slide shows up on the screen).
[She] rode her bicycle, talked to her cats and bought new hats for her collection.
 
</pre>
 
  
===Cuts and on-screen changes===
+
3. '''Is the subtitle’s duration shorter than 1 second or longer than 7 seconds?'''<br />
Subtitles function almost as an additional layer of editing, because they can connect or divide up cuts and scenes. The transcriber must bear this in mind when synchronizing the subtitles and breaking the lines, and should make sure that the line breaks reflect on-screen changes, preserving the flow of the video. Very often the subtitle will need to reflect on-screen content, such as when the speaker refers directly to visual information presented on a slide, or talks about something in the immediate physical environment (e.g. miming something while describing it).
+
A subtitle displaying for less than one second will usually disappear too quickly for most users, and this issue will be compounded in translation. Subtitles displaying for over 7 seconds are distracting to the viewer and should be split into two separate subtitles.
  
===Cueing and line-breaking for translation===
+
If there is a longer piece of music or applause, have the sound representation (e.g. (Music)) display for 3 seconds and then indicate when the sound is about to end (e.g. (Music ends)).
Thanks to the Open Translation Project, every talk has a chance of being translated into many different languages. Keeping the lines within the character limit, ensuring adequate on-screen duration and putting line breaks in the correct places also helps the translators in creating foreign-language subtitles that are easy to follow and carry the original message across. Due to differences between languages, a short subtitle in English may turn out to be quite long in the target language, and vice versa. Even though the translators are able to compress the form of the translated subtitle, e.g. by omitting padding expressions and simplifying the syntax<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>, sometimes such compression may be impossible. The most difficult cases are acronyms (e.g. "PTA meeting", "FDA approval"). Because the target language may not have a recognizable acronym for the same concept, the translators must very often use the full form of the name. Even though the translators are able to alter subtitle duration time, most inexperienced volunteers will prefer to keep the original duration. For this reason, it is advisable to use the acronym when the speaker uses it, but to try to make the duration of the subtitle containing the acronym a little longer (e.g. lagging one second), if possible, to allow more on-screen time for the translation of the full form of the name that the acronym refers to.
 
  
==Spelling and punctuation==
+
4. '''Does the subtitle lag too long into a pause?'''<br />
It is important to decide on a spelling convention before starting the transcript. TED transcripts use US spelling and punctuation rules (see the [http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences Wikipedia article on American and British English spelling differences]). Such choices are also important when working in other languages with several regional variations (e.g. Spanish/Castilian or Portuguese).
+
Do not have the subtitle stay on the screen for more than 1 second after the speaker has paused after a sentence. If you’ve covered up long pauses in the synchronizing step, once you’re done synchronizing the whole transcript, you can shorten the durations of these subtitles using the sliders in the timeline, so that they don’t lag over pauses. You can choose not to show pauses inside a sentence, or if necessary, indicate that the sentence was broken off by using dots (...) or a dash (-), depending on the conventions in your language (note: in subtitles, use a minus instead of a full dash). However, always try to show longer pauses between complete sentences.
  
===Commas, colons and semicolons===
+
=Avoiding character display errors: simple quotes, apostrophes and dashes=
A subtitle should preferably not end in a colon or semicolon, because these characters are not very visible at the end of the line<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref>. The most common punctuation mistakes result from misusing the comma. While working in English, it is advisable to review the differences between comma use in American English and British English before starting the transcript.<ref>Kosur, Heather M. [http://www.suite101.com/content/more-punctuation-rules-for-commas-in-english-a213367 More Punctuation Rules for Commas in English]. Retrieved 2011-08-03.</ref>.
+
Using smart/curly double quotes (“”) is precarious, because some players will have trouble displaying them correctly. Please use the simple, straight ASCII double quote (") or the straight apostrophe (<nowiki>''</nowiki>) for single quotes. The rule is similar for apostrophes: use the straight apostrophe (') instead of the typographic/curly apostrophe (’). Instead of an en/em dash (–/—), use a hyphen (-).
  
===Abbreviations===
+
For other punctuation marks in your languages, as much as possible, use a simple ASCII equivalent (research to find one for your language). This may go against strict typographic conventions, but the technical limitations of most subtitle formats mean that without this simplification, for some users, many of the "correct" characters will simply not be displayed (e.g. when playing talks offline). Note that these rules only apply to the subtitles, and you should use proper punctuation in titles and descriptions.
Subtitles reflect spoken language, and thus should not contain elements typical to written language. "E.g." for "for example" and "i.e." for "that is" should not be used in subtitles. Abbreviations of any kind should not be added if they had not been used by the speaker, in spite of the fact that they may make a difficult subtitle shorter. The only exceptions to this are standard abbreviations for units of measurement (e.g. ft for "feet").
 
  
===Capitalization===
+
You should not use HTML tags or any other formatting tags in TEDx transcripts, because these tags will not display correctly in the YouTube player.
Capitalization rules vary from language to language. If the speaker is citing a title in English, or using a word that is capitalized in English, the transcript should conform to the appropriate English spelling rules (British or American). However, if the speaker is citing a title in their first language, the transcript should employ capitalization rules for that language. This also covers cases where the title or proper name is transliterated and does not have an established translation in English (or any language being transcribed).
 
  
====Spelling in titles====
+
=Title and description format=
Most words in movie and book titles, and usually in song titles, are capitalized in English (for which words not to capitalize, see [http://www.writersblock.ca/tips/monthtip/tipmar98.htm Capitalization in Titles] at the Writer's Block website). The rules governing the capitalization of article, report and paper titles vary, with some sources suggesting that the words in the title should be capitalized according to the rules for capitalizing book titles<ref>The Mayfield Handbook of Technical and Scientific Writing. [http://imgi.uibk.ac.at/mmetgroup/MMet_imgi/tools/mayfield/capitals.htm Section 9.1. Capitalization]. Retrieved 2011-08-03.</ref>, while others suggesting that only the first word of such titles should be written with a capital letter<ref>Baker, David S. and Lynn Henrichsen.  [http://linguistics.byu.edu/faculty/henrichsenl/apa/APA06.html#articleT APA REFERENCE STYLE: Articles in Journals]. Retrieved 2011-08-03.</ref>. TEDTalks titles follow the latter convention, with only the first word in the title capitalized (the first word in the talk title is almost always the speaker's first name). If the talk title contains a colon after the speaker's name, the first word after the colon is capitalized (e.g. "Paul Bloom: The origins of pleasure"<ref>Bloom, Paul. [http://www.ted.com/talks/paul_bloom_the_origins_of_pleasure.html The origins of pleasure]. Talk delivered at TEDGlobal 2011. Retrieved 2011-08-03.</ref>).  
+
Each TEDx talk comes with a title and description added by the TEDx organizer, which are imported into Amara from YouTube. However, these sometimes contain too little or too much information and may not conform to the formatting standards described below. In these cases, you are expected to edit them before you submit your transcription.
  
While book and movie titles are normally written in italics, TED transcripts do not use rich formatting and therefore putting text in italics is not possible. Quotation marks should be used instead (single or double, depending on whether the transcript should conform to British or American spelling rules, respectively). If a speaker forgets a title in English and replaces it with the equivalent from their first language, the English title should be written in square brackets, e.g.:
+
'''Note:''' The language of the title and description should match the language of the talk. Do not put English titles and descriptions on non-English talks.
  
<pre>
+
==Title format==
SPOKEN:
+
[[File:titleanddescription.png|300px|thumb|alt=Image shows how to edit the title and description of the talk in the Amara interface.|Click the “pencil” button to edit the title and description]]
 +
The standard title format uses the talk’s title, the speaker’s name and the TEDx event’s name, separated with the vertical bar (pipe) character (with a space before and after it):
  
You know, she's like the bear in... "Pu der Bär".
+
'''On being a young entrepreneur | Christophe Van Doninck | TEDxFlanders'''<br />
  
TRANSCRIPT:
+
If the title is formatted differently, modify it to match the standard format. Do not add the event’s date to the title.
  
You know, she's like the bear in ["Winnie the Pooh"].
+
If the title is missing, it's OK to just leave the speaker's name, but consider coming up with a title on your own or contacting the organizer or speaker for a title suggestion.
</pre>
 
  
One exception to this is when the speaker or somebody in the audience immediately recollects the English title, or any other reference is made in the talk to the speaker's using a title from a different language. In such cases, using a title from a different language becomes part of the talk, and the original title must be kept.
+
In English titles, use sentence case: capitalize only the first word in the title and any proper names.
  
====Capitalization in proper names====
+
==Description format==
Proper names are words used for unique entities. Proper names are capitalized in English. Multi-word proper names usually follow the capitalization rules for book titles, with most of the words capitalized <ref>The Mayfield Handbook of Technical and Scientific Writing. [http://imgi.uibk.ac.at/mmetgroup/MMet_imgi/tools/mayfield/capitals.htm Section 9.1. Capitalization]. Retrieved 2011-08-03.</ref>.
+
The description should consist of a short overview of the talk. Remove all links to external websites (unless they represent the speaker’s organization that the talk is about). If the description also contains the speaker’s bio, you can keep it in, but the general text explaining what the TEDx program is should be left out (“In the spirit of ideas worth spreading, TEDx is a program of local, self-organized events…”). If the description is missing, please consider adding your own short description of the talk.
  
Many words have a different meaning when capitalized. For example, according to the International Astronomical Union guidelines, the word "sun" should be capitalized when referring to the unique entity in Earth's solar system (i.e. the Sun), but is not capitalized when used as a common noun signifying a star in another system.<ref>International Astronomical Union. [http://www.iau.org/public/naming/ Naming Astronomical Objects]. Retrieved 2011-08-03.</ref>
+
The description may also contain the following disclaimers, which should be kept in and translated:
  
===Special characters===
+
'''This talk was given at a local TEDx event, produced independently of the TED Conferences.'''
  
====Em dash====
+
'''This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at http://ted.com/tedx'''
TED transcripts use an em dash instead of dots. An em dash entered as two consecutive hyphens (--) is converted into a proper em dash. An em dash (—) can also be inserted into a text file (like a subtitle file) by holding down the Alt key and typing 0151 on the numeric keyboard.
 
  
====Accented letters====
+
[https://docs.google.com/spreadsheets/d/1AxpMYq1XKpDsiImzQWxdFBpko04FA1eA3JbUER1wnWI/edit?usp=sharing Here], you can find model translations of these disclaimers in various languages. If you can't find your language, consult
Many accented letters found in languages that use the Latin alphabet (e.g. ó, ö), as well as commonly used special characters (e.g. ©), can be easily typed on Windows and OS X using a number of codes. Otherwise, one can insert a special character in a rich-formatted word processor (like LibreOffice Writer) and then copy it and paste into the online or offline subtitling tool that you are using. This method will not work with all special characters. The "Computing with Accents, Symbols and Foreign Scripts" website from Penn State University offers a [http://tlt.its.psu.edu/suggestions/international/accents/index.html very useful guide] to typing special characters in Windows and OS X.
+
with a [[:Category:Language_Coordinators|Language Coordinator]] and send the model translation that you came up with to [mailto:translate@ted.com translate@ted.com].
  
Importantly, such characters may be necessary even in English-language transcripts, when they appear in proper names without an established English transliteration, e.g. "Jónas Hallgrímsson" (the name of an Icelandic poet).
+
=How to get more talks transcribed=
 +
If you are a TEDx organizer with multiple untranscribed talks, consider reaching for help out to the TED Translators and Transcribers community on Facebook. Try to select one or two prioritized talks and explain why it’s important for you to get these particular talks transcribed. Find ways to make transcribing your talks a challenge and make sure to show appreciation to the transcribers (e.g. by thanking them on your website).
  
===Spellchecker===
+
Remember that the volunteer TED transcribers and translators are volunteers and they usually select talks that are meaningful to them in some way, out of the tens of thousands of TEDx talks in the world. Because your team and your local community are much more invested in trying to promote the ideas in the talks from the events they have attended, try to collaborate with the local transcriber community in coaching your team in transcribing talks and organizing [[How_to_organize_a_transcribeathon|transcribeathons]].
Most offline subtitling tools offer a spellchecking feature. In online subtitling tools, plugins for the web browser can check the spelling of any text entered into a box. Alternatively, an exported subtitle file can be opened in a word processor with a spellchecking feature. If the particular word processor does not work with UTF-8 encoded text, open the file in any text editor that supports this format, and then copy and paste the text into the word processor. After making changes, copy the text in the word processor and paste it back into the subtitle file opened in the text editor.
 
  
==Sound information==
+
[[Category:TEDx Translators]]
Sound representation in a transcript is meant to enable deaf and hard-of-hearing viewers (as well as viewers watching the talk without the sound on) to understand all the non-spoken auditory information that is necessary to comprehend the talk to the same degree that a hearing audience potentially would. In TED transcripts, sound information is enclosed in parentheses, with the first word starting with a capital letter. There are generally two types of sound information used in TED transcripts: sound representation and speaker identification.
+
[[Category:Guidelines]]
 
+
[[Category:TEDx Organizers]]
===Common sound representation===
 
The most common sound representations in TED transcripts are:
 
 
 
* (Laughter) - for laughter that fills any time in the talk where the speaker is not saying anything
 
* (Applause) - for applause (clapping) that fills any time in the talk where the speaker is not saying anything
 
* (Music) - for music that fills any time in the talk where the speaker is not saying anything
 
 
 
===Uncommon sound representation===
 
There are many possible types of sounds that need to be represented in the transcript. For example, at this point in this [http://www.youtube.com/watch?v=M8_VHwJzkQc&t=1m6s TEDxKrakow talk]<ref>Moskal, Paweł. [http://www.youtube.com/watch?v=M8_VHwJzkQc Medical imaging with anti-matter]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>, the transcript contains the phrase "(Phone rings twice)." The fact that the phone rings was represented in the transcript because the speaker pauses, and the slide with the phone is made prominent. Without the sound representation, a non-hearing viewer may have been confused as to why the speaker paused (why there are no subtitles representing spoken utterances) and what was meant to be conveyed by the slide with the picture of an old-style telephone. Additionally, the example of the phone ringing is referred to later in the talk, which serves as another reason why the sound representation must be there. However, in this particular talk, it was important not only to point out that the phone rang, but that it rang twice. The information about the phone ringing ''twice'' was included because the speaker later contrasted this audio example to the phone ringing only once. Because of this, the "sound information" that needed to be represented in the transcript became "phone ringing twice." If the speaker just intended to play the sound of a phone ringing in their talk, it would not be necessary to point out that the sound consisted of two separate rings, and the sound representation would thus simply be "(Phone ringing)."
 
 
 
====Speaker sounds====
 
Important sound information can also include sounds made by the speaker, e.g. (Gasping), (Hooting). It is necessary to represent these sounds if they are not made accidentally, but instead constitute an important part of the talk, e.g.:
 
 
 
<pre>
 
Do you know how I felt after talking the whole day? (Gasping) I had to take a day off after that.
 
</pre>
 
 
 
These types of speaker sounds must also be represented in the transcript if they are later referred to in some way, even if the sound was produced accidentally (e.g. if the speaker clears his throat and says "I wish they gave us more water").
 
 
 
====Environmental sounds====
 
There are sounds that are not an important part of the talk and elicit no visible reaction from the speaker or the audience (e.g. a shutter sound from somebody taking a picture in the audience), and so, they do not need to be represented in the transcript. The only exception to this rule is when a coincidental sound causes the speaker or the audience to react in a visible way. For example, if somebody in the audience drops a plastic bottle and the speaker jumps and then laughs, the sound of the bottle falling needs to be represented, in order to give the non-hearing viewers an idea of why the speaker reacted in this manner.
 
 
 
===Speaker identification===
 
Speaker changes need to be represented in the transcript. Additional speakers may appear if the speaker who began the talk is joined by another speaker on stage (e.g. for a question-and-answer session), or if video or audio material featuring spoken utterances is included in the talk. In TED transcripts, speakers are indicated by their full names and a colon the first time they appear, and by their initials (no periods) when they appear again in the same conversation. Consider this example:
 
 
 
<pre>
 
Oh, you've got a question for me? Okay. (Applause)
 
 
 
Chris Anderson: Thank you so much for that. You know, you once wrote, I like this quote,
 
"If by some magic, autism had been eradicated from the face of the Earth, then men
 
would still be socializing in front of a wood fire at the entrance to a cave."
 
 
 
Temple Grandin: Because who do you think made the first stone spears? The Asperger guy. (...)
 
 
 
CA: So, I wanted to ask you a couple other questions. (...) But if there is someone here
 
who has an autistic child, or knows an autistic child and feels kind of cut off from them,
 
what advice would you give them?
 
 
 
TG: Well, first of all, you've got to look at age. (...)
 
</pre>
 
 
 
Source: [http://www.ted.com/talks/lang/eng/temple_grandin_the_world_needs_all_kinds_of_minds.html Temple Grandin: The world needs all kinds of minds]<ref>Grandin, Temple. [http://www.ted.com/talks/lang/eng/temple_grandin_the_world_needs_all_kinds_of_minds.html The world needs all kinds of minds]. Talk delivered at TED2010. Retrieved 2011-08-03.</ref>
 
 
 
====Re-identifying speakers====
 
If some time has passed since a given speaker was introduced, when they start speaking again, they need to be re-identified by their full name, not just the initials. For example, if a talk by speaker X features a short video with speaker Y, and the video is paused and then continued five minutes later into the talk, speaker Y must be identified again by their full name when they start speaking in the video again, because without access to sound information, a non-hearing viewer may not be able to tell that it is the same speaker as in the first part of the video.
 
 
 
====Identifying off-camera voices====
 
Any comment from off-camera also needs to be identified by the speaker's name. If the comment comes from the audience, it can be identified generically with just the word "audience" used as a sound representation cue, i.e.:
 
 
 
<pre>
 
(Audience) I want to add something!
 
</pre>
 
 
 
==Editing/compressing the talk==
 
When working on subtitles, one is normally required to compress, omit certain linguistic items from the original spoken dialog (e.g. padding, emphasizing constructions), and rephrase certain complex syntactic structures to make the subtitle easier to follow (e.g. changing the Passive Voice into Active Voice).<ref>Karamitroglou, Fotios. [http://translationjournal.net/journal/04stndrd.htm Subtitling Standards -- A Proposal]. Retrieved 2011-08-03.</ref> In contrast, TED transcripts are by convention altered much less in this regard. Nevertheless, there are many cases where some degree of editing is necessary to preserve the speaker's intended meaning.
 
 
 
===Types of linguistic issues that may need editing===
 
Mistakes that may change the intended message of the talk are especially apparent in TEDx talks delivered in English by non-native speakers. In each case, however, one needs to be very careful not to alter the speaker's intended meaning while editing the transcript, and if there is any doubt as to whether altering part of the original talk may result in changing the intended meaning, it may be preferable to retain the original wording or consult with the speaker before making any modifications.
 
 
 
Types of mistakes that may require editing include:
 
 
 
* Mispronouncing certain words, which results in an unintended change of meaning, e.g. "Lost my beat" instead of "Lost my bit"
 
* Using a grammatical construction from the speaker's first language and thus altering the meaning of the particular sentence, e.g. "Apples eats Mary" used to mean "Mary eats apples"
 
*Using a word or term incorrectly, where the context establishes without a doubt that a different meaning was intended, e.g. "Harvard, Stanford and other high schools like them" used to mean "Harvard, Stanford and other [universities] like them"
 
* Morphological mistakes, e.g. using the singular instead of the plural, using the present form of the verb instead of the past, etc.
 
* Problems with pronouns: "she/he" instead of "it" used by speakers whose first language distinguishes genders
 
* Code-switching, i.e. accidentally using a word or phrase from the speaker's first language, or from a different language than the main language of the talk, e.g. "And then, he met an einhorn" used to mean "And then, he met [a unicorn]"
 
* Slips of the tongue and run-on phrases (where the speaker changes their mind about what to say, altering a word while it is being spoken): "In the firs-previous slide" used to mean "In the previous slide" (slips of the tongue usually do not require brackets)
 
 
 
===Using square brackets to mark editing===
 
If changes need to be made, provided that the item being changed does not exceed roughly 75% of the subtitle, it should be put in square brackets, in order to emphasize that the words in the brackets are a rephrased version of what is actually being said (e.g. "And when she is hungry, apples eats Mary" --> "And when she is hungry, [Mary eats apples]."). If more than 75% of the line needs to be rephrased, in order to maintain clarity and make the subtitle easier to read, it may be advisable to forego using the square brackets altogether, and instead treat the line as the result of monolingual translation (translation between one variety of one language into another variety - here, from ungrammatical to grammatical phrases).
 
 
 
===Examples of changes in transcripts===
 
 
 
====Incorrect vocabulary====
 
 
 
<pre>
 
ORIGINAL: (...) they know, from generation to generation, how to protect and prevent the land (...).
 
 
 
EDITED: (...) they know, from generation to generation, how to protect and [preserve] the land (...).
 
</pre>
 
:''Source'': Jadwiga Łopata: [http://www.youtube.com/watch?v=WFxSVhNqiEk Food Sovereignty and the Family Farm]<ref>Łopata, Jadwiga. [http://www.youtube.com/watch?v=WFxSVhNqiEk Food Sovereignty and the Family Farm]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>
 
 
 
<pre>
 
ORIGINAL: These people are in many areas more vulnerable, or sensible (...).
 
 
 
EDITED: These people are in many areas more vulnerable, or [sensitive] (...).
 
</pre>
 
 
 
:''Source'': [http://www.youtube.com/watch?v=sYfhrT9x36Y Łukasz Cichocki on the Pan Cogito hotel]<ref>Cichocki, Łukasz. [http://www.youtube.com/watch?v=sYfhrT9x36Y Łukasz Cichocki on the Pan Cogito hotel]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>
 
 
 
====Slip of the tongue====
 
 
 
<pre>
 
ORIGINAL: I'm over and over again (...) intrigued the profound effects such movement lessons may have on us,(...)
 
 
 
EDITED: I'm over and over again (...) intrigued [by] the profound effects such movement lessons may have on us,(...)
 
</pre>
 
 
 
:''Source'': [http://www.youtube.com/watch?v=VctXJOePfs8 Jacek Paszkowski on the Feldenkreis Method]<ref>Paszkowski, Jacek. [http://www.youtube.com/watch?v=VctXJOePfs8 Jacek Paszkowski on the Feldenkreis Method]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>
 
 
 
<pre>
 
ORIGINAL: They were the first on the market, and they are the leader, that is no doubt.
 
 
 
EDITED: They were the first on the market, and they are the leader, [there is] no doubt.
 
</pre>
 
 
 
:''Source'': Marcin Iwiński and Michał Kiciński: [http://www.youtube.com/watch?v=24qJXgiuO1E Think different - it's still extremely up to date]<ref>Iwiński, Marcin and Michał Kiciński. [http://www.youtube.com/watch?v=24qJXgiuO1E Think different - it's still extremely up to date]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>
 
 
 
====Multiple syntactic issues, repetition====
 
 
 
<pre>
 
ORIGINAL:
 
I was several times asked by journalists
 
why in Wrocław there is possible some things
 
which is not possible or would not be possible
 
in Warsaw or even in Cracow.
 
 
 
EDITED:
 
I was asked several times by journalists
 
why some things are possible in Wrocław
 
which are not or would not be possible
 
in Warsaw or even in Cracow.
 
</pre>
 
:''Source'': Mirosław Miller: [http://www.youtube.com/watch?v=qyufJK-3eFM Dream Dealers from Wrocław]<ref>Miller, Mirosław. [http://www.youtube.com/watch?v=qyufJK-3eFM Dream Dealers from Wrocław]. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.</ref>
 
 
 
===What not to edit===
 
Importantly, editing the talk (i.e. not transcribing verbatim) should be limited to cases where preserving the original wording would make it very difficult or impossible to follow the meaning of the talk. There may be words and phrases in the talk that do not conform to the transcriber's standards of style, such as colloquialisms/slang, swear words, and stylistic and grammatical issues that do not make it impossible to understand the talk (e.g. double negatives). Changing words like these based on the transcriber's preference or beliefs about grammatical correctness amounts to altering the speaker's style, and as such should be avoided on ethical grounds.
 
 
 
==External links==
 
 
 
===Subtitling articles and guidelines===
 
* [http://www.screen.subtitling.com/downloads/Subtitletimingandtimecode.pdf "Subtitle Timing and Time Code," Screen Subtitling Systems (PDF)]
 
* [http://www.bbc.co.uk/guidelines/futuremedia/accessibility/subtitling_guides/online_sub_editorial_guidelines_vs1_1.pdf "BBC Online Subtitling Editorial Guidelines V1.1" (PDF)]
 
* [http://www.transedit.se/code.htm "Code of Good Subtitling Practice," TransEdit]
 
* [http://www.transedit.se/ Resources for subtitlers, TransEdit]
 
* [http://www.subtitling.com/downloads/Subtitlepreparation.pdf "Subtitle Preparation Guide," Screen Subtitling Systems]
 
*[http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/standards_for_subtitling/index.asp.html "Guidance on Standards for Subtitling," Ofcom]
 
* [http://www.transedit.se/glossary.htm Glossary of subtitling terminology, TransEdit]
 
* [http://www.bookrags.com/tandf/subtitling-tf/ "Subtitling" - from the Routledge Encyclopedia of Translation Studies]
 
* [http://translationjournal.net/journal//30subtitling.htm "Viewer-Oriented Subtitling," an article by Ali Hajmohammadi]
 
* [http://translationjournal.net/journal/04stndrd.htm "Subtitling Standards -- A Proposal," an article by Fotios Karamitroglou]
 
 
 
===Subtitling tools===
 
 
 
====Online subtitling tools====
 
* [http://www.universalsubtitles.org Amara]
 
* [http://dotsub.com/ Dotsub]
 
* [http://www.overstream.net/ Overstream]
 
* [http://www.youtube.com/watch?v=B6jXPpqVPVI Tutorial on using YouTube's Auto Timing]
 
 
 
====Offline subtitling tools====
 
All of the offline tools listed below are freeware. Most of them can also be used to convert between subtitle formats.
 
 
 
=====Linux=====
 
*[http://home.gna.org/subtitleeditor/ Subtitle Editor]
 
*[http://gnome-subtitles.sourceforge.net/ Gnome Subtitles]
 
*[http://ksubtile.sourceforge.net/ KSubtitle]
 
*[http://karasik.eu.org/software/ Subtitles]
 
*[http://www.aegisub.org/ Aegisub]
 
*[http://www.jubler.org/ Jubler]
 
*[http://home.gna.org/gaupol/index.html Gaupol]
 
 
 
=====OS X=====
 
*[http://www.aegisub.org/ Aegisub]
 
*[http://www.jubler.org/ Jubler]
 
*[http://elfdata.com/rb/?filter=SubX SubX]
 
*[http://subsfactory.traintrain-software.com/index.php?langue=en Subs Factory]
 
 
 
=====Windows=====
 
*[http://www.nikse.dk/SubtitleEdit Subtitle Edit]
 
*[http://www.urusoft.net/products.php?cat=sw&lang=1 Subtitle Workshop]
 
*[http://ahd-subtitles-maker.webnode.com/ AHD Subtitles Maker Professional]
 
*[http://www.aegisub.org/ Aegisub]
 
*[http://kijio.org/ Kijio]
 
*[http://www.divxland.org/subtitler.php DivXLand Media Subtitler]
 
*[http://www.submagic.tk/ SubMagic]
 
*[http://sourceforge.net/projects/subtitlecreator/ SubtitleCreator]
 
 
 
====Character encoding====
 
* [http://www.motobit.com/util/charset-codepage-conversion.asp Online codepage converter]
 
* [http://macchiato.com/unicode/convert.html Another online UTF converter]
 
 
 
====Other tools====
 
* [http://www.javascriptkit.com/script/script2/charcount.shtml Online Cut & Paste character counter]
 
* [http://www.softpedia.com/get/PORTABLE-SOFTWARE/Multimedia/Video/Easy-Subtitle-Converter.shtml Easy Subtitle Converter] - converts between 20 subtitle formats (Windows)
 
* [http://subtitlefix.com:8080/nadasgy/ Online subtitle converter]
 
 
 
===Playing videos with .srt subtitles===
 
Most [[#Offline subtitling tools | offline subtitling tools]] can also be used to play the video with subtitles. However, stand-alone players are usually more convenient.
 
 
 
* [http://www.videolan.org/vlc/ VLC media player] - a multi-platform player with subtitle support (Linux, OS X, Windows, Android, iOS)
 
* [http://www.ehow.com/how_6906681_play-_srt-files-mac.html A guide to playing videos with .srt subtitles on OS X - without the need to install the VLC media player]
 
* [http://www.brighthub.com/computing/windows-platform/articles/41466.aspx A guide to playing videos with .srt subtitles in most players using the DirectVobSub codec]
 
* [http://www.makeuseof.com/tag/play-avi-files-subtitles-playstation-3/ A guide to playing videos with .srt subtitles on the PlayStation 3]
 
 
 
For more information on how to play videos with subtitles, including instructions on obtaining subtitles to TEDTalks to play with the videos, see [[Playing_TEDTalks_with_subtitles_offline | this guide]].
 
 
 
===Spelling and punctuation===
 
 
 
====Spelling====
 
* [http://www.iau.org/public/naming/ The International Astronomical Union's guidelines on naming astronomical objects]
 
* [http://www.writersblock.ca/tips/monthtip/tipmar98.htm "Capitalization in Titles,"] from the Writer's Block site
 
* [http://imgi.uibk.ac.at/mmetgroup/MMet_imgi/tools/mayfield/capitals.htm "Capitalization in proper names,"] from the Mayfield Handbook of Technical and Scientific Writing
 
* [http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences Wikipedia article on American and British English spelling differences]
 
* [http://tlt.its.psu.edu/suggestions/international/accents/index.html "How to Type Accents,"] a guide to typing non-English letters from Penn State University
 
 
 
====Punctuation====
 
* [http://www.informatics.sussex.ac.uk/department/docs/punctuation/node00.html "Guide to Punctuation,"] a thorough set of rules by Larry Trask
 
* [http://www.lc.unsw.edu.au/onlib/punc.html "A Rough Guide to Punctuation,"] a succinct overview from the University of New South Wales
 
* [http://www.scc.spokane.edu/_docs/default/tips/d_t4s_0_Common+Punctuation+Errors.pdf "Common Punctuation Errors,"] a guide from the Spokane Community College (PDF)
 
 
 
==References==
 
{{Reflist}}
 

Latest revision as of 16:46, 20 April 2020

Read this article in other languages: Español日本語French

A TEDx transcript is a form of same-language subtitles or captions. In addition to containing the words spoken by the speaker, the transcript must additionally be divided into subtitle lines and then synchronized (timed) to match the flow of the recorded talk. Like closed captions, TEDx transcripts also contain sound information for Deaf and hard-of-hearing viewers. Below, you will find hints and strategies useful in creating TEDx transcripts as an OTP volunteer. If you haven't joined the OTP yet, go to TED.com/transcribe.

This guide is an extension of this video tutorial. Note that the line-length and reading speed information below are guidelines for languages based on the Latin script; for other languages, the rules may be different. If you believe these rules are not suitable for your language, please contact us at translate@ted.com.

IMPORTANT: before you start working on a transcript, make sure that the video is part of the TED team on Amara, using this guide (which also contains a link to a form you can use to add a video that is not on Amara). Otherwise, it may be impossible to publish your work on YouTube and make it available for translations. This tutorial shows how to properly search for talks available for transcription on Amara.

What are the benefits of getting your talks transcribed?

Transcripts are important for several reasons:

  • Same-language subtitles make the talk accessible to Deaf and hard-of-hearing viewers
  • Transcribed talks get indexed in Google, giving them and your event more exposure
  • Only talks with a transcript can later be translated (and possibly considered by TED for further distribution)

The transcription project workflow

TEDx talk videos are uploaded to YouTube. Subtitles for those videos are created in an online tool created by our subtitling partner, Amara. In order to sign up for an account on Amara, and learn how to find videos to subtitle, watch these short OTP Learning Series tutorials.

Once a transcript has been completed, it must be reviewed by another volunteer and then approved by a Language Coordinator. Approved transcripts can then be viewed while watching the TEDx talk on YouTube. The transcriber and reviewer are credited for their work on their TED.com profiles.

To get additional support, consider joining the general Facebook group for Open Translation Project volunteers, and/or the local TED translator group for your specific language. You can find the list of language groups here.

HINT: If you're working on an English transcript, make sure to read our English Style Guide.

Overview of the transcribing process

Image shows the controls box in the Amara interface.
Users can review controls and guidelines right from the subtitling interface
Transcribing an 18-minute talk usually takes between 4 to 6 hours; the user has 30 days to complete that task. Transcribing is divided into three steps:

1. Writing down text and splitting it into subtitles
This step usually takes between 2-4 hours and involves typing down what the speaker says and dividing this text into subtitles that are in keeping with TED’s standards for length and are easy to read (e.g. don’t contain slips of the tongue, don’t merge two sentences together).

2. Synchronizing the subtitles, editing the reading speed
This step usually takes up to one hour. The transcriber uses a simple interface to mark where the subtitles created in step one should display, and then fine-tunes the timing where necessary to improve synchronization and bring the reading speed down to TED’s standards.

3. Editing the title and description
Before submitting the subtitles, the transcriber needs to make sure the title and description of the talk are in the language of the talk and are formatted according to TED’s standards (learn more here).

To get a quick overview of working with subtitle lengths and reading speed, watch this short video tutorial, as well as this tutorial that contains a few useful tips for transcribing talks. Below, you will find more detailed advice covering each of the three transcribing steps, as well as some more technical information on formatting and timing the subtitles.

Below, you will find hints and strategies that you will find very useful when transcribing talks. For a quick introduction, watch this short video tutorial.

Dividing the text into subtitles

This step usually takes between 2-4 hours. The user plays the talk and types down what the speaker says. In order to allow the viewer to read the subtitles easily, while typing down the transcript, the transcriber breaks subtitles longer than 42 characters into two lines, and begins a new subtitle once a maximum of 84 characters total have been reached (the subtitle can be shorter). This length information is displayed conveniently in the subtitling interface, for every subtitle. (Note: these values are applicable to all languages that use the Latin script. For length standards in other languages, consult resources in that language’s section of OTPedia or ask a Language Coordinator).

The main goal is to create subtitles that are easily read, well-rounded bits of text. This means that transcribers try to only split subtitles where it wouldn’t separate phrases and grammatical units (e.g. they don’t split an article and a noun at the end of a line or subtitle). To comply with TED’s length and line-breaking standards, a degree of rephrasing is permissible, as long as it doesn’t change the meaning of the sentence; slips of the tongue and obvious mistakes should not be included in the transcript.

Image shows the transcribing step in the Amara interface.
The first step is about dividing text into subtitles

When deciding how to divide the text into subtitles, you should consider the following points:

1. Is the subtitle long enough to break it into two lines?
If the text you will have in the subtitle is over 42 characters in length, you should break it into a maximum of two different lines (two lines in the same subtitle). To break the line, hit Shift+Enter. You don’t need to break subtitles shorter than 42 characters; very short subtitles broken into two lines can be distracting to the viewer. IMPORTANT: The subtitle should never be longer than 84 characters total, and should contain no more than 2 lines.

2. Is the text that I'm entering too long to work as a single subtitle?
If the text you are entering is longer than 84 characters, you should create two subtitles instead.

3. Do the lines and the whole subtitle end neatly in "linguistic wholes"?
You should take care to break the lines and end the subtitles after linguistic wholes (e.g. don’t separate a possessive and a noun or somebody’s first and last name). Learn more here.

4. Am I including redundant text?
Broken phrases ("I wanted to--No, this is what I'll talk about"), repetitions ("Thank you, thank you, thank you, thank you") and empty syllables ("erm," "umm" etc.) should not be included in the transcript. Also, do not include obvious errors, like when the speaker says "We thinks" instead of "We think." Instead, use the correct form of the word in the subtitle. On rare occasions, if you believe that the need for the change is obvious (e.g. the speaker says “up” instead of “down”), but your edit will significantly alter the meaning of the sentence, put it in square brackets, to indicate intentional editing (e.g. “I woke up at 9 AM, and the sun was [up].”).

5. Do I really have to cut the sentence up into this many subtitles?
As much as possible while respecting the length and reading speed standards, try to have the subtitle contain a “full” part of the sentence (a clause), or the whole sentence. This will make it easier to read, and it will be easier for translators later on to translate bigger chunks of one sentence than smaller ones, since not everything will divide up easily in the same way in the target language as it does in the original. To learn more about how to make your subtitles easier for future translators, see this guide.
IMPORTANT: Never include the end of one sentence and the beginning of another in the same subtitle (e.g. "this is why./And another idea").

This printable cheat sheet contains all of the main OTP technical subtitling standards for Latin-script languages

6. Did I include all of the sound information essential to understanding the talk?
Include all of the sound information essential to understanding the talk, such as non-verbal sounds that the speaker refers to (“(Clears throat) Sorry about that.”), off-screen speaker changes (indicate who is speaking, if that is not obviously visible), as well instances of music, clear laughter and applause from the audience (with the exception of intro music and applause heard at the beginning of the talk). Also, indicate any temporary change of language, and translate the subtitle into the main language of the talk (e.g. “(Arabic) This is my idea.”) Put the sound information in parentheses (e.g. (Music)), with the first letter capitalized, and always represent the sound, not the event that caused it (e.g. “(Gunshot),” not "(Dog fires gun)."). For more information about using sound representation, read this guide.

7. Did I include on-screen text?
If possible without overlapping other subtitles and going over the subtitle length and reading speed limits, include on-screen text that is part of the talk (e.g. text on slides or embedded subtitles in a video played on the stage). This will allow this text to be translated into other languages. In order to signify that this is on-screen text and not something the speaker is saying, put the representation of on-screen text between square brackets.

Do not transcribe on-screen text which is not relevant to the content of the talk, nor text which will not be translated (e.g. the name of the TEDx event).

Synchronizing the subtitles with the video

Image shows the synchronization step in the Amara interface.
The subtitles are synced using a simple videogame-like interface
This step usually takes up to one hour. Starting with text neatly divided into subtitles, the transcriber now needs to tell the system when to show each of the subtitles while playing the video. The user plays the talk and hits the up arrow when the first subtitle should start displaying, and then hits the down arrow whenever the currently highlighted subtitle should stop displaying and the next one should start. Afterwards, they go back and make finer edits to the timing using sliders on the video timeline to set the beginning and end of subtitles (e.g. to fix a subtitle that starts displaying too long after a speaker started the equivalent sentence). For more information on using the Amara interface to sync subtitles, read this article.

Once the subtitles have been synchronized, the user goes back to implement reading speed fixes using sliders in the timeline. In order to allow the viewer to read the subtitle while it’s displayed on the screen, the reading speed for each subtitle must not be higher than 21 characters per second. This speed information is displayed for every subtitle on Amara, and wherever this speed is exceeded, the transcriber can compress or reduce text (without changing the meaning) or/and extend the duration of the subtitle to fix the issue.

HINT: A red exclamation mark is displayed on every subtitle that needs fixing for length or reading speed.

When synchronizing your subtitles, consider the following points:

1. Is the reading speed no more than 21 characters/second?
The maximum reading speed for subtitles is 21 characters/second. To maintain a good reading speed, you can extend the duration of the subtitle, even if it’s going to run a little into the time the next sentence is spoken (but don't start the subtitle more than about 100 ms before the equivalent bit of speech is heard).

Extending the duration usually helps, but if necessary for a good reading speed, combine this with rephrasing the text of the subtitle to shorten/compress it while preserving the meaning. Remember that with a reading speed that is too high, the subtitle will just disappear too quickly for most viewers to read, which is tantamount to cutting it out of the transcript. For this reason, it’s always better to compress the text a little rather than create a verbatim transcript that viewers won’t be able to follow. Good reading speed is also very important because your transcript will often serve as the starting point for translations, and the equivalent subtitle may become much longer in the target language, raising the reading speed even more.

For more advice on compressing/reducing text in subtitles, see this guide.

HINT: Occasionally, if the subtitle contains potentially difficult vocabulary (scientific terminology, obscure proper names), consider lowering the reading speed to values even below 21 characters/second, to make it easier for the viewer to take in the content of the subtitle and allow more reading speed for future translations (which are often longer than the original subtitle).

2. Is the subtitle synchronized with the equivalent bit of speech?
Generally, the subtitle should start displaying when the speaker says the equivalent bit of speech. However, good reading speeds are more important than perfect synchronization. If you need to extend the duration of the previous subtitle to get a good reading speed, it’s OK to have the next one start some time after the speaker said those words. However, don’t have the subtitle start displaying before the speaker says the equivalent sentence, since the mismatch in body language and on-screen content can be distracting to the viewer. This is especially important in cases where synchronizing changes in the video with changes in the subtitles is crucial to what happens in the talk (e.g. if possible, a subtitle that reveals what's in a slide should not show up before the slide shows up on the screen).

3. Is the subtitle’s duration shorter than 1 second or longer than 7 seconds?
A subtitle displaying for less than one second will usually disappear too quickly for most users, and this issue will be compounded in translation. Subtitles displaying for over 7 seconds are distracting to the viewer and should be split into two separate subtitles.

If there is a longer piece of music or applause, have the sound representation (e.g. (Music)) display for 3 seconds and then indicate when the sound is about to end (e.g. (Music ends)).

4. Does the subtitle lag too long into a pause?
Do not have the subtitle stay on the screen for more than 1 second after the speaker has paused after a sentence. If you’ve covered up long pauses in the synchronizing step, once you’re done synchronizing the whole transcript, you can shorten the durations of these subtitles using the sliders in the timeline, so that they don’t lag over pauses. You can choose not to show pauses inside a sentence, or if necessary, indicate that the sentence was broken off by using dots (...) or a dash (-), depending on the conventions in your language (note: in subtitles, use a minus instead of a full dash). However, always try to show longer pauses between complete sentences.

Avoiding character display errors: simple quotes, apostrophes and dashes

Using smart/curly double quotes (“”) is precarious, because some players will have trouble displaying them correctly. Please use the simple, straight ASCII double quote (") or the straight apostrophe ('') for single quotes. The rule is similar for apostrophes: use the straight apostrophe (') instead of the typographic/curly apostrophe (’). Instead of an en/em dash (–/—), use a hyphen (-).

For other punctuation marks in your languages, as much as possible, use a simple ASCII equivalent (research to find one for your language). This may go against strict typographic conventions, but the technical limitations of most subtitle formats mean that without this simplification, for some users, many of the "correct" characters will simply not be displayed (e.g. when playing talks offline). Note that these rules only apply to the subtitles, and you should use proper punctuation in titles and descriptions.

You should not use HTML tags or any other formatting tags in TEDx transcripts, because these tags will not display correctly in the YouTube player.

Title and description format

Each TEDx talk comes with a title and description added by the TEDx organizer, which are imported into Amara from YouTube. However, these sometimes contain too little or too much information and may not conform to the formatting standards described below. In these cases, you are expected to edit them before you submit your transcription.

Note: The language of the title and description should match the language of the talk. Do not put English titles and descriptions on non-English talks.

Title format

Image shows how to edit the title and description of the talk in the Amara interface.
Click the “pencil” button to edit the title and description

The standard title format uses the talk’s title, the speaker’s name and the TEDx event’s name, separated with the vertical bar (pipe) character (with a space before and after it):

On being a young entrepreneur | Christophe Van Doninck | TEDxFlanders

If the title is formatted differently, modify it to match the standard format. Do not add the event’s date to the title.

If the title is missing, it's OK to just leave the speaker's name, but consider coming up with a title on your own or contacting the organizer or speaker for a title suggestion.

In English titles, use sentence case: capitalize only the first word in the title and any proper names.

Description format

The description should consist of a short overview of the talk. Remove all links to external websites (unless they represent the speaker’s organization that the talk is about). If the description also contains the speaker’s bio, you can keep it in, but the general text explaining what the TEDx program is should be left out (“In the spirit of ideas worth spreading, TEDx is a program of local, self-organized events…”). If the description is missing, please consider adding your own short description of the talk.

The description may also contain the following disclaimers, which should be kept in and translated:

This talk was given at a local TEDx event, produced independently of the TED Conferences.

This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at http://ted.com/tedx

Here, you can find model translations of these disclaimers in various languages. If you can't find your language, consult with a Language Coordinator and send the model translation that you came up with to translate@ted.com.

How to get more talks transcribed

If you are a TEDx organizer with multiple untranscribed talks, consider reaching for help out to the TED Translators and Transcribers community on Facebook. Try to select one or two prioritized talks and explain why it’s important for you to get these particular talks transcribed. Find ways to make transcribing your talks a challenge and make sure to show appreciation to the transcribers (e.g. by thanking them on your website).

Remember that the volunteer TED transcribers and translators are volunteers and they usually select talks that are meaningful to them in some way, out of the tens of thousands of TEDx talks in the world. Because your team and your local community are much more invested in trying to promote the ideas in the talks from the events they have attended, try to collaborate with the local transcriber community in coaching your team in transcribing talks and organizing transcribeathons.