Hi, Well, we already discussed in GWL context where to put “large” data set without reaching a conclusion. Having “large” data set inside the store is probably not a good idea. But maybe these data of models are not that “large” to worry about the store.
On lun., 03 avril 2023 at 18:48, Nicolas Graves via "Development of GNU Guix and the GNU System distribution." <[email protected]> wrote: > In the case of nerd-dictation, the model parameters that can be used > are listed here : https://alphacephei.com/vosk/models Here, it is not that large… --8<---------------cut here---------------start------------->8--- vosk-model-en-us-0.22 1.8G [...] vosk-model-en-us-0.42-gigaspeech 2.3G [...] vosk-model-ru-0.10 2.5G --8<---------------cut here---------------end--------------->8--- …compared to already some packages about data: --8<---------------cut here---------------start------------->8--- $ for p in $(guix build -S $(guix package -A 'r\-' | grep genome | cut -f1)); do du -sh $p ;done | sort -hr | head -9 807M /gnu/store/x2540idvd9pfmwz7ix04wm6ks58zwqkm-BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000.tar.gz 692M /gnu/store/0vnlm5z2gkmzk2kkxzlab787kqjiw5g9-BSgenome.Hsapiens.UCSC.hg38_1.4.4.tar.gz 678M /gnu/store/ngvghqhmjzscfxgzc1b9b4djws5rfzws-BSgenome.Hsapiens.UCSC.hg19_1.4.3.tar.gz 656M /gnu/store/187smrknx3k5avhqapswrj40zh24h966-BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1.tar.gz 601M /gnu/store/c15pc126x7k54yrqmbfwgg7gxkgbm9ip-BSgenome.Mmusculus.UCSC.mm10_1.4.0.tar.gz 598M /gnu/store/cwsm9lqfmd1y9mwsx4sq4rzf45br6by2-BSgenome.Btaurus.UCSC.bosTau8_1.4.2.tar.gz 594M /gnu/store/jky74snf2vr2r3s9c5131vacql6rna6a-BSgenome.Mmusculus.UCSC.mm9_1.4.0.tar.gz 374M /gnu/store/zjzjag2zd408xnj5nq9ckfpcx22h7m4j-BSgenome.Drerio.UCSC.danRer11_1.4.2.tar.gz 37M /gnu/store/abfk8jwhdd7d62jybfbvrgl682db7q2w-BSgenome.Dmelanogaster.UCSC.dm3_1.4.0.tar.gz --8<---------------cut here---------------end--------------->8--- but still. Well, I do not know if this data set of 2G fits the store but I do not have better to propose. > One caveat is that using all these models can take a lot of space on the > servers, a burden which is not useful because no build step are really > needed (except an unzip step). In this case, we can use the > #:substitutable? #f flag. You can find an example of some of these > packages right here : > https://git.sr.ht/~ngraves/dotfiles/tree/main/item/packages.scm It is what is done for some packages in gnu/packages/bioconductor.scm https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/bioconductor.scm#n904 > So my question is: Should we add this type of models in packages for > Guix? If yes, where should we put them? In machine-learning.scm? In a > new file machine-learning-models.scm (such a file would never need new > modules, and it might avoid some confusion between the tools and the > parameters needed to use the tools)? Well, gnu/packages/machine-learning-data.scm or s/data/models sounds good to me. Cheers, simon
