Hi Sam,

Thank you for the input. I see your point, and those are exactly why I
wrote proposal B in my draft. Here is my quick response after going through
the text.

On Wed, 2025-02-05 at 07:45 -0700, Sam Hartman wrote:
> 
> TL;DR: I think it is important for Debian to consider AI models free
> even if those models are based on models that do not release their
> training data. In the terms of the DFSG, I think that a model itself is
> often a preferred form of modification for creating derived works. Put
> another way, I don't think toxic candy is as toxic asĀ  I thought it was
> readingĀ  lumin's original ML policy.
> If we focus too much on availability of data, I think we will help the
> large players and force individuals and small contributors out of the
> free software ecosystem.
> I will be drafting a GR option to support this position.

I want to point out that the "preferred form of modification for creating
DERIVED WORKS" -- the "derived works" is where your proposal (and proposal B)
differs from the proposal A.

Proposal A (toxic candy is not free software), preserves the full freedom
for derived works, but also the freedom to inspect, study, reproduce, modify
the original base model. Only covering derived work is not a integral freedom.

Proposal B (toxic candy is free software), is similar to treating those base
models as blobs (such as firmware) that no free software community can really
handle (at the current stage).

I do not see how proposal A harms the ecosystem. It just prevents huge
binary blobs from entering Debian's main section of the archive. It does not
stop people from uploading the binary blobs to non-free section.

General AI applications are not something to worry about even with proposal A.
DebGPT [https://tracker.debian.org/pkg/debgpt] itself incorporated two common
practice how the existing AI applications work:

(1) by default, DebGPT behaves as a REST API client. It supports a wide range
    of existing service end points, including commercial and self-hosted ones.
(2) the build-in backend of DebGPT can pull a binary blob from internet and
    provide the REST endpoint using that model.

I personally do not see how insisting proposal A can harm the ecosystem. While
developers cannot put binary blobs into main, but you can still trigger the
automatic download from the software in main.

I consistently believe putting a giant binary blob (base model) into main, that
nobody other than the upstream can reproduce is ridiculously funny. That said,
non-free is somewhere such model can go.


My appreciation to software freedom roots in the equal sharing of knowledge
that benefits human in the long run. When I was young, looking at the binary
blobs of Microsoft Windows, while being unable to easily learn how computer
works really disappointed me. The discovery of Debian, makes me feel happy
with open source crap even if they falls behind the closed-source Ferrari.

Proposal A preserves the integrity of knowledge when anybody wants to study
the stuff in depth. Proposal B departures from my appreciation to software
freedom. I hope free software can still help people achieve their personal
revolution in terms of knowledge and skill in the future that belongs to AI,
just like how it has done for me.


Let's leave enough time preparing the proposal. I'll focus on my proposal A
and incorporate the others' suggestions from the list.

Reply via email to