Building New TTS
From FestivalTe
This document is meant to serve as a guide to help developers of Festival based TTS systems in Indian languages. It may also be useful for other languages which are similar in nature to Indian languages.
Contents |
Background Study
An overview of text-to-speech synthesis from Festvox site is a good starting point. Festival documentation gives a more detailed description of the architecture. Festival provides an interface for scripting in Scheme language. It uses its own Scheme interpreter called siod.
Spending time on the above links before proceeding is sure to pay off!
Changing festival-te Source Code
A good part of the festival-te source code can be reused by changing variable and functionname appropriately. All references to source code in this document are from the Telugu Festival package which is available for download. For a new INDLANG TTS, there are two changes which need to be done to all the source Scheme files.
- change all functions named as telugu_functionname to INDLANG_functionname
- change files named telugu_filename.scm to INDLANG_filename.scm Inside these files, the declaration (provide 'telugu_filename) should be changed to (provide 'INDLANG_filename)
Description of festival-te source is available here.
Handling Unicode
Festival does not support Unicode. Which is to say, characters are represented using 1 byte. Because of which, character manipulation functions provided by siod will not be of much use for manipulating UTF-8 strings. However, with a few hacks we can use festival to process input text represented in Unicode.
Since UTF-8 strings are valid C strings, string comparision functions work perfectly. So siod functions like string-equal, string-matches can be used without a problem.
Building TTS
The adjacent diagram gives a high level description of the text-to-speech generation process. Building a TTS for a new langugage using Festival will involve writing modules for the following
- Normalize input text
- Define lexicon and rules to convert words to sounds (phones)
- Define a new voice
Text Normalization
The Text Normalization or Text Analysis module defines the rules to convert raw input text into words which are to be spoken out.
The default tokenizing rules will parse the raw input text based on punctuation etc. to output tokens. These tokens are the logical units which need to be read out. They generally need not be changed. The variable token.punctuation holds the symbols treated as punctuations.
A good part of the Telugu text normalization code can be reused for other languages by changing the language strings in the source file telugu_scm/telugu_token.scm. The Scheme function telugu_token_to_words will define rules to convert these tokens into words which are to be readout. For instance, it converts numbers, symbols, currencies, dates, time, abbreviations, ratios etc to words.
Lexical Analysis
The lexical analysis module takes the output words from the text analysis module and converts them into phones. These phones are the units which make up the spoken language. festival-te uses a diphone database, (group/NSKlpc.group) the phoneset for which is defined in telugu_scm/telugu_phones.scm This phoneset consists of 48 phones which cover most Indian langugages. For more information on the database, contact DONLab, Dept. of CS&E, IIT Madras. The adjacent diagram gives a mapping of the phones to the Telugu alphabet.
For a new langugage, in the file telugu_phones.scm, replace (defPhoneset telugu ... with (defPhoneset INDLANG ...
We need to define rules to convert words to phones. For English, this is done using a pronounciation dictionary. Since Indian languages are phonetic in nature, a dictionary is not required. We can arrive at the phones based on the spelling of the word. This is done by defining a set of LTS rules (letter-to-sound)
These rules are defined using the function lts.ruleset in telugu_scm/telugu_lex.scm
For a completely phonetic language, the following types of rules need to be defined. LHS is made of UTF-8 chars and RHS is the phones from the phoneset.
( LEFTCONTEXT [ ITEMS ] RIGHTCONTEXT = NEWITEMS ) 1. ( [independent_vowel] = vowel_phone ) 2. ( [consonant + halant] = consonant_phone ) 3. ( [consonant] dependent_vowel = consonant_phone ) 4. ( [consonant] = consonant_phone + 'a' vowel_phone ) 5. ( [dependent_vowel] = vowel_phone )
Please note that the orders of the rules is important (Rule 2, 3 should preceed Rule 4), since they are checked sequentially to match the condition.
The ITEMS to be matched is a list of characters. Since festival does not support Unicode, to match a Indian Language character which is multi-byte, it has to be split into the bytes it is composed of.
For example, to match the Telugu independent vowel అ which is composed of \340\260\205, we split the character into the individual bytes and write the rule as
( [ à ° <85> ] = a )
The siod function lts_in_alphabet is useful to test a ruleset on a word.
For Telugu, defining LTS rules is sufficient to convert words to phones. There maybe an exception to this in the case of languages like Tamil, where disambiguation for certain phones is required. For such languages, a mix of dictionary and lts rules need to be used. Refer to creating lexicons.
Defining a New Voice
The voice definition is give by voice_telugu_NSK_diphone method in telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm The voice database is located at telugu_NSK_diphone/group/NSKlpc.group. Note that the name of the voice definition directory, scheme file and voice loading function have to be same.
Apart from specifying the diphone database to be used, a voice definition gives important information required for speech synthesis . It specifies the phoneset, lexicon, lts ruleset etc to be used for synthesis. Methods for converting tokens to words, wave form synthesis etc are also specified here.
The following parameters have to be changed appropriately in the file telugu_NSK_diphone/festvox/telugu_NSK_diphone.scm
(set! load-path (cons (path-append libdir "telugu_scm/") load-path)) (set! token_to_words telugu_token_to_words) (set! guess_pos telugu_guess_pos)
It maybe required to change the more function and variable names.

