Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Andre Natal Sat, 08 Nov 2014 21:34:28 -0800

Hi Chris.

For new languages, after the decoder get integrated inside gecko, you only
need to build new models (acoustic and language), since the decoder is
language agnostic.


The procedure of model building is the same for every language: in pretty
big picture, you need to record thousands of hours of spoken phrases
covering all phones of the aimed language from people of different genders
age, regions, accents and etc... all this data is compiled and transformed
in the acoustic model.

For the language model, you need to build a phonetic dictionary for that
language, to then allow tools that do grapheme-to-phoneme (like
phonetisaurus [1], e.g.) generate real-time phonetic representations of the
words input in your grammar.

Build models it is not a trivial task, and requires a closer work between
speech engineers and linguists.

Pocketsphinx offers some models besides English [2]  and they have useful
tutorials about acoustic [3] and language [4] model creation.

Thanks,

Andre

[1] https://code.google.com/p/phonetisaurus/
[2]
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
[3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustic&s[]=models
[4] http://cmusphinx.sourceforge.net/wiki/tutoriallm



On Thu, Oct 30, 2014 at 10:45 PM, Chris Hofmann <[email protected]>
wrote:

> On 10/30/14 5:24 PM, smaug wrote:
>
>> On 10/31/2014 02:21 AM, smaug wrote:
>>
>>> Intent to ship is too strong for this.
>>> We need to first have implementation landed and tested ;)
>>>
>>> I wouldn't ship the implementation in desktop FF without plenty of more
>>> testing.
>>>
>>>
>> But I guess the question is what people think about shipping the
>> pocketspinx + API, even if disabled by default.
>>
>> Andre, we need some numbers here. How much does Pocketsphinx increase
>> binary size? or download size?
>> When the pref is enabled, how much does it use memory on desktop, what
>> about on b2g?
>>
>>
>>  This is important work and the competition is ramping quicky after many
> years of promises about this year being the year of voice recognition.  We
> will probably fall behind quickly if we don't get something going here in
> the next year.
>
> Can you also talk a bit about what the plan and set of challenges look
> like for expanding the supported languages, and how these would impact the
> numbers ollie has asked for?
>
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.
>
> -chofmann
>
>
>>>
>>> -Olli
>>>
>>>
>>> On 10/31/2014 01:18 AM, Andre Natal wrote:
>>>
>>>> I've been researching speech recognition in Firefox for two years. First
>>>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
>>>> [1] embedded in Gecko C++ layer, project that I had the luck to develop
>>>> for
>>>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
>>>> Gonçalves, Steven Lee, Randell Jesup plus others and with the
>>>> management of
>>>> Sandip Kamat.
>>>>
>>>> The implementation already works in B2G, Fennec and all FF desktop
>>>> versions, and the first language supported will be english. The API and
>>>> implementation are in conformity with W3C standard [2]. The preference
>>>> to
>>>> enable it is: media.webspeech.service.default = pocketsphinx
>>>>
>>>> The required patches for achieve this are:
>>>>
>>>>   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
>>>>   - Embed english models. Bug 1065911 [4]
>>>>   - Change SpeechGrammarList to store grammars inside SpeechGrammar
>>>> objects.
>>>> Bug 1088336 [5]
>>>>   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug
>>>> 1051148 [6]
>>>>
>>>>
>>>> Also, other important features that we don't have patches yet:
>>>>   - Relax VAD strategy to be les strict and avoid stop in the middle of
>>>> speech when speaking low volume phonemes [7]
>>>>   - Integrate or develop a grapheme to phoneme algorithm to realtime
>>>> generator when compiling grammars [8]
>>>>   - Inlcude and build models for other languages [9]
>>>>   - Continuous and wordspotting recognition [10]
>>>>
>>>> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
>>>> has more detailed info [13].
>>>>
>>>> At this comment you can see a cpu usage on flame while recognition is
>>>> happening [14]
>>>>
>>>> I wish to hear your comments.
>>>>
>>>> Thanks,
>>>>
>>>> Andre Natal
>>>>
>>>> [1] http://cmusphinx.sourceforge.net/
>>>> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>>>> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
>>>> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
>>>> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
>>>> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
>>>> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
>>>> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
>>>> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
>>>> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
>>>> [11] https://github.com/andrenatal/gecko-dev
>>>> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
>>>> (Jump
>>>> to 12:00)
>>>> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
>>>> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
>>>>
>>>>
>>>
>> _______________________________________________
>> dev-platform mailing list
>> [email protected]
>> https://lists.mozilla.org/listinfo/dev-platform
>>
>
> _______________________________________________
> dev-platform mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-platform
>
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Reply via email to