Hi Mo, thanks again for all your effort for Deep Learning in Debian. Please note, that I'm not competent in this field.
On Tue, May 21, 2019 at 12:11:14AM -0700, Mo Zhou wrote: > > https://salsa.debian.org/lumin/deeplearning-policy > (issue tracker is enabled) Not sure whether this is sensible to be added to the issue tracker. > See my draft for details. Quoting from your section "Questions Not Easy to Answer" 1. Must the dataset for training a Free Model present in our archive? Wikipedia dump is a frequently used free dataset in the computational linguistics field, is uploading wikipedia dump to our Archive sane? I have no idea about the size of this kind of dump. Recently I've read that data sets for other programs tend into the direction of 1GB. In Debian Med I'm maintaining metaphlan2-data with 204MB which would be even larger if there would not be some method for "data reduction" would be used that is considered a bug (#839925) by other DDs. 2. Should we re-train the Free Models on buildd? This is crazy. Let's don't do that right now. If you ask me bothering buildd with this task is insane. However I'm positively convinced that we should ship the training data and be able to train the models from these. Kind regards Andreas. -- http://fam-tille.de