Hi Ian, > Lumin writes ("Concerns to software freedom when packaging deep-learning > based appications."): > > 1. Is GPL-licended pretrained neural network REALLY FREE? Is it really > > DFSG-compatible? > > No. No. > > Things in Debian main shoudl be buildable *from source* using Debian > main. In the case of a pretrained neural network, the source code is > the training data. > > In fact, they are probably not redistributable unless all the training > data is supplied, since the GPL's definition of "source code" is the > "preferred form for modification". For a pretrained neural network > that is the training data.
It would be intresting if we look at some real examples. I emphasized GPL becase I see the project[1] mentioned in the original post. is released under GPL. However, it didn't clearly clarify the license of the pretrained network that the upstream distributes. However, according to [1]'s readme: "Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware". Well, I guess that means pure free software stack can never reproduce this work. Apart from that Alpha-Go project, actually I see more pretrained models released under MIT/BSD/Apache2, or public domain. Here is another example about ImageNet, which is a typical dataset in the computer vision field. https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/readme.md Framework: Caffe(BSD-2-clause), repro code: (BSD-2-clause), pretrained-network: (public domain), dataset: ImageNet (???) [2] -> Software stack is free, but not the ImageNet dataset. -> Debian cannot distribute related applications in the main section. > > 2. How should we deal with pretrained neural networks when some of us > > want to package similar things? Should it go contrib? > > If upstream claims it is GPL, and doesn't supply training data, I > think it can't even go to contrib. OK, then what if the model is released under a more premissive license such as MIT/BSD, given that the model is trained from non-free academic dataset by a non-free software stack (replacable by CPU implementation but with ridiculous time cost)? I think in this case that kind of network can enter contrib. > If upstream does not claim to be supplying source code, or they supply > the training data, then I guess it can go to contrib. The Computer Vision research community is eager to release their code and model under premissive licenses such as BSD/MIT. Many related conferences publish the papers without access restriction. The biggest problem lies in the big data. > Note that the *use* of these kind of algorithms for many purposes is > deeply troublesome. Depending on the application, it might cause > legal complications such as difficulties complying with the European > GDPR, illegal race/sex/etc. discrimination, and of course it comes > with big ethical problems. > > :-/ hmm... It appears to be quite incompatible to DFSG, but that's another complex story :-) > Ian. > [1] https://github.com/gcp/leela-zero [2] http://image-net.org/download-faq I searched and found no explicit license declaration of ImageNet. But according to the agreement text, the dataset may be non-free.