On Mon, Nov 02, 2009, Geoff Shang wrote about "Re: hebrew text to speech": >... > The problem is of course that written Hebrew does not include many of the > vowel sounds, which means that a text-to-speech engine needs to either > make educated guesses about which sounds to insert when, or have a big > word dictionary ... or both. >...
The situation is much harder than just having a big dictionary. The problem is that you'll soon realize that in a large percentage of the words you run across, there is more than one way to read them: for example, does הרכבת mean "the train" or "did you ride?" or "you assembled" or "the assembly of"? (of course, the problem is that each of those meanings is also pronounced differently). In some cases (such as the above example) adding syntax parsing can resolve these ambiguities. In other cases, even this is not enough: does "שתיתי חלב" mean that "I drank milk" or that "I drank candlewax"? Obviously it's the first, but syntax is not enough - you also need to know a thing or two about which kinds of actions are compatible with which kind of object. Prof. Uzzi Ornan from the Technion has a great Hebrew reading application that works according to these principles, but unfortunately it is not free. If anybody on this list has the energy and the knowledge (or the enthusiasm to learn), we can create a free Hebrew reader ourselves in probably just a few years, with the following 3 steps (some can be done in parallel): 1. Add to Hspell also vowel sounds. This is a harder than it sounds (because we need to know how the vowels change in inflection...), but possible. 2. Write a syntax parser, which takes the morphological analysis from Hspell as input. 3. Add to the lexicon information on which classes of verbs are compatible with different classes of subjects and objects, to further reduce ambiguities. Nadav. -- Nadav Har'El | Monday, Nov 2 2009, 15 Heshvan 5770 n...@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |I am the world's greatest authority on my http://nadav.harel.org.il |own opinion. _______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il