Hi Sam, Thank you for the input. I see your point, and those are exactly why I wrote proposal B in my draft. Here is my quick response after going through the text.
On Wed, 2025-02-05 at 07:45 -0700, Sam Hartman wrote: > > TL;DR: I think it is important for Debian to consider AI models free > even if those models are based on models that do not release their > training data. In the terms of the DFSG, I think that a model itself is > often a preferred form of modification for creating derived works. Put > another way, I don't think toxic candy is as toxic asĀ I thought it was > readingĀ lumin's original ML policy. > If we focus too much on availability of data, I think we will help the > large players and force individuals and small contributors out of the > free software ecosystem. > I will be drafting a GR option to support this position. I want to point out that the "preferred form of modification for creating DERIVED WORKS" -- the "derived works" is where your proposal (and proposal B) differs from the proposal A. Proposal A (toxic candy is not free software), preserves the full freedom for derived works, but also the freedom to inspect, study, reproduce, modify the original base model. Only covering derived work is not a integral freedom. Proposal B (toxic candy is free software), is similar to treating those base models as blobs (such as firmware) that no free software community can really handle (at the current stage). I do not see how proposal A harms the ecosystem. It just prevents huge binary blobs from entering Debian's main section of the archive. It does not stop people from uploading the binary blobs to non-free section. General AI applications are not something to worry about even with proposal A. DebGPT [https://tracker.debian.org/pkg/debgpt] itself incorporated two common practice how the existing AI applications work: (1) by default, DebGPT behaves as a REST API client. It supports a wide range of existing service end points, including commercial and self-hosted ones. (2) the build-in backend of DebGPT can pull a binary blob from internet and provide the REST endpoint using that model. I personally do not see how insisting proposal A can harm the ecosystem. While developers cannot put binary blobs into main, but you can still trigger the automatic download from the software in main. I consistently believe putting a giant binary blob (base model) into main, that nobody other than the upstream can reproduce is ridiculously funny. That said, non-free is somewhere such model can go. My appreciation to software freedom roots in the equal sharing of knowledge that benefits human in the long run. When I was young, looking at the binary blobs of Microsoft Windows, while being unable to easily learn how computer works really disappointed me. The discovery of Debian, makes me feel happy with open source crap even if they falls behind the closed-source Ferrari. Proposal A preserves the integrity of knowledge when anybody wants to study the stuff in depth. Proposal B departures from my appreciation to software freedom. I hope free software can still help people achieve their personal revolution in terms of knowledge and skill in the future that belongs to AI, just like how it has done for me. Let's leave enough time preparing the proposal. I'll focus on my proposal A and incorporate the others' suggestions from the list.