Hi Hynek, All: I'm not sure I agree that speech engines should not do their own audio output. While I think you have identified some real problems with that approach, it's not clear that the ".wav file" approach has a low enough latency. If tests show that latency is not a problem, then passing the synthesized audio bits to the driver for processing (perhaps via multiplexing/mixing in most situations, or for pre-emptive audio in others) does seem to have advantages.
Hynek, I think you've also identified a good reason for one of the "many layers" in our architecture... we don't really want a bug in the speech engine to crash our TTS service. Using a C API, even when licenses permit, usually means sharing process space with the driver, and for many drivers the code is closed-source, making diagnosis and recovery very difficult indeed. In such a situation we probably need to implement the process-space separation in our own TTS architecture, so that we can restart the engine when things go badly wrong. regards Bill On Wed, 2006-06-28 at 16:11, Hynek Hanke wrote: > > Festival is free software, so this is of course fixable. Having looked > > at the code, it's simple code and it wouldn't break if it'd be stretched > > a bit. But that's not improving a driver: that's improving festival (if > > the authors allow) and then having to depend on a very new version of > > it. > > Hi Enrico, > > also the problem with speech engines doing their own audio output > (apart from what you said about Festival) is that this audio output > needs to be configured at several places if several engines are used, > many places where code needs to be updated if a new audio technology > comes etc. > > > [...] > > So the proper way to implement a festival driver seems to me to use the > > text-to-wave function and then do a proper handling of playing the > > resulting wave, hopefully using the audio playing technology that's > > trendy at the moment. > > Yes, I agree. Actually this is what both Speech Dispatcher and KTTSD are > doing and I think I've heard Gnome Speech would also like to go this way > in the future. > > > I looked into esd without understanding if it is > > trendy anymore, and I look at gstreamer without understanding if it > > isn't a bit too complicated as a default way to play a waveform. > > This is fairly complicated. I've investigated into possibilities for > audio output and I've ended up sumarizing our requirements if such a > technology should eventually come in the future and writing my own > small library for output to OSS, Alsa and NAS. Please see > http://lists.freedesktop.org/archives/accessibility/2005-April/000049.html > and feel free to have comments. One of the problems is the latency we > need. That ruled out both ESD and Gstreamer at that time, I'm not sure > what is the state now with Gstreamer. Another thing is that if we are > aiming for a desktop independent speech technology, we need desktop > independent audio output. > > > I don't know much about the APIs of other speech engines. If they all > > had a text-to-wave function > > Most of the engines do. Some don't, but this is their drawback (what if > I want to have the audio synthesized and save to a file?). As you said, > it is very desirable to retrieve the audio for those engines that > support it. > > > , then it can be a wise move to implement a > > proper audio scheduler to share among TTS drivers, which could then > > (reliably) support proper integration with the audio system of the day, > > progress report, interruption and whatever else is needed. This would > > ensure that all TTS drivers would have the same (hopefully high) level > > of reliability wrt audio output. > > Yes, that is mine dream too! Would you be wiling to help with this? > I think we would first have to see what is new and consider the options > again. > > > > Now, one of the big problems is that Festival doesn't offer proper logs. > > > It would often refuse connection for a stupid typo in the configuration > > > file and not give any clue to the user. This is something which should > > > be fixed. > > This can probably be fixed: festival can be told not to load any config > > file > > This is not really useful. Configuration is really needed. > > > , and log can be implemented adding a couple of printfs before calls > > to the C++ API. > > That is the log from the side of the speech api provider (Gnome Speech > etc.). This already exists in Dispatcher and as I said is automatic from > a TCP API. I was talking about logs on the side of Festival. > > You will never be able to discover why a particular voice was not > loaded/doesn't work, why a sound icon is not playing, what is the typo > in your configuration files, why is it not finding a module (wrong path) > and such from just talking to Festival via its API (be it C++ or TCP). > > Currently the only way for the users to fix such problems is to run > Festival from command line and hope it will write some cryptic message > to stderr. Then what is left are guesses, past experiences with problems > and black magic. We must be able to diagnose problems. > > >> [from my earlier post] > >> Now, one of the big problems is that Festival doesn't offer proper > >> logs. > > You say you find the Festival C code clear and modifications not > difficult. If this could be fixed, that would be superb. I don't think > Alan would object to include the patch. And it would not introduce > a dependency for us. I don't know however how soon it could get > into some official release. But I think it is worth looking into. > > > And something like a TTS driver which becomes the main > > form of access to the computer should be designed to properly restart in > > case of segfaults in its own code, be it festival or whatever else. > > Yes, this is something we tried in Speech Dispatcher, but it doesn't > always work. We should get this part right in TTS API. The objection > that with the TCP API it is easier to see what part is crashing, after > which commands exactly, however remains. > > With regards, > Hynek Hanke > > > _______________________________________________ > gnome-accessibility-list mailing list > gnome-accessibility-list@gnome.org > http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list _______________________________________________ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list