Am 26.12.21 um 20:51 schrieb Matthew Miller:
Marius, are the different language packs updated continually and separately,
or is there one versioned set of all of them released at intervals? Is it a
case where everything is regenerated, or are additions incremental? (And do
they _replace_ or just add?)
The language files are seperate for any language. They do not update
together.
It more the massive amount of storage space in total that worries me.
The first release would be less than 40G, that was just a size the
entire project will reach easily, if it grows
like it did in the past.
It does seem like it'd be nice to have a way to deliver (officially from
Fedora in a way that can be shipped in Spins and containers) static files
that don't change, without needing to redownload gigabytes on upgrade. Of
course, delta RPMs are one way, but need a lot of investment in actually
working again. Ostree deltas are another — and maybe upcoming work on
container deltas could be helpful.
I don't see a way to reduce the update size, as it mostly one big file:
[marius@eve ~]$ ll /usr/share/pva/vosk-model-de-0.21/
insgesamt 28
drwxr-xr-x. 2 marius marius 4096 21. Aug 2020 am
drwxr-xr-x. 2 marius marius 4096 2. Aug 2020 conf
drwxr-xr-x. 3 marius marius 4096 9. Aug 2020 graph
drwxr-xr-x. 2 marius marius 4096 21. Aug 2020 ivector
-rw-r--r--. 1 marius marius 740 15. Sep 00:21 README
drwxr-xr-x. 2 marius marius 4096 9. Aug 2020 rescore
drwxr-xr-x. 2 marius marius 4096 15. Sep 00:14 rnnlm
[marius@eve ~]$ du -sh /usr/share/pva/vosk-model-de-0.21/*
100M /usr/share/pva/vosk-model-de-0.21/am
12K /usr/share/pva/vosk-model-de-0.21/conf
685M /usr/share/pva/vosk-model-de-0.21/graph
8,2M /usr/share/pva/vosk-model-de-0.21/ivector
4,0K /usr/share/pva/vosk-model-de-0.21/README
2,1G /usr/share/pva/vosk-model-de-0.21/rescore
281M /usr/share/pva/vosk-model-de-0.21/rnnlm
[marius@eve ~]$ ll /usr/share/pva/vosk-model-de-0.21/rescore/
insgesamt 2171812
*-rw-r--r--. 1 marius marius 2115929988 14. Sep 20:58 G.carpa*
-rw-r--r--. 1 marius marius 107992138 14. Sep 20:50 G.fst
(And... I think it'd be useful in a lot of cases to be able to do dist-git
-> container without needing to build RPMs as an intermediate step. But...
that's not a thing we have now.)
As far as I understand the packaging rules, autodownloaders are not welcome,
and for security reasons, i absolutly support this.
We could downsize the problem at the beginning, because there are no
voice commands ready for other languages, so it does not make sense to
have the language models around. I really hope the project gets a kick
start once the first people use. it's quite easy to write a set of commands
and get it running. I suggest a nice feature in the fedora magazin about
a working assistent for fedora.
So at the beginning, we talk about 2-4 GB for german and english. the
pva itself isn't that storage hungry, a mb at best. A few vosk deps
here and there:
~100mb uncompressed maybe.
For now, I'm rebuilding the compile process against our fedora libs, so
we can ship the required packages for kaldi & vosk. The required libs
shipped with Fedora are older than the actual ones used by vosk devs,
which is a problem.
With pip as source for vosk, it works as expected, but the local vosk &
kaldi builds do not yet work :(
best regards,
Marius
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure