Hi Andy, On 2019-05-23 17:52, Andy Simpkins wrote: > Sam. > Whilst i agree that "assets" in some packages may not have sources > with them and the application may still be in main if it pulls in > those assets from contrib or non free. > I am trying to suggest the same thing here. If the data set is unknown > this is the *same* as a dependancy on a random binary blob (music / > fonts / game levels / textures etc) and we wouldn't put that in main.
The "ToxicCandy Model" is used to cover a special case. Both "ToxicCandy" and "Non-free" model cannot enter our main section, as stated by DL-Policy #1 from the beginning. > It is my belief that we consider training data sets as 'source' in > much the same way.... We can interpret training data as sort of "source" indeed. But some times we even have trouble with free "source". Wikipedia dump is a frequently used free corpus in the computational linguistics field. Do we really want to upload the wikipedia dump to the archive when some Free Model to be packaged is trained from it? Wikipedia dump is so giant that challenges our .deb format (see recent threads). See (Difficulties -- Dataset Size): https://salsa.debian.org/lumin/deeplearning-policy#difficulties-questions-not-easy-to-answer