Hi Stefano

On 2024/10/29 13:03, Stefano Zacchiroli wrote:
On Mon, Oct 28, 2024 at 09:53:31PM +0200, Jonathan Carter wrote:
The companies [...]  want to restrict what you can actually use it
for, and call it open source? And then OSI makes a definition that
seems carefully crafted to let these kind of licenses slip through?

The licensing terms for the Meta Llama models are indeed horrific, but I
don't understand your point here. In order to be OSAID compliant, Meta
will precisely have to change those licensing terms and make them
DFSG-compliant. That would be a *good* thing for the world and would fix
the main thing you are upset about.

Unfortunately that's not the case. Meta won't have to make Llama3 DFSG compliant in order to be OSAID compliant, since OSAID as not as robust as the OSD.

The OSAID has no provisions for explicit free (re)distribution as the OSD#1 has, so Meta could continue to require license fees over a certain user count and still claim that they are OSAID compliant. They could also keep their clause about Llama3 being licensed under a non-transferable license (OSD#7), which as far as I understand, is there to prevents forks from happening, and this too would be OSAID compliant.

LLama3's license is particular dodgy, but it's not unique in the AI space, and I can assure you that if someone out there is even convinced to do the minimum that OSAID requires, it might still be a far shot from DFSG free, and for that reason we as Debian should absolutely not endorse it imho.

And Meta is not liking that idea. Meta is, right now, lobbying EU
regulators to convince them that what should count as "open source AI"
for the purposes of the EU AI Act is their (Meta's) version, rather than
OSAID.

I have personally fought (and lost) during the OSAID definition process
to make access to training data mandatory in the definition. So while
I'm certainly not against criticizing OSAID, we should do that for the
right reasons.

What is the OSI's motivation for creating such an incredibly lax definition for open source AI? Meta is already calling their absolutely-not-open-source model Open Source and promoting it as such, without as much as a *peep* from the OSI condemning the abuse of the term. (although, while doing a quick search to make sure that's true, I found this link from OSI to an article that keeps insisting that LLama3 is open source: https://opensource.org/press-mentions/meta-inches-toward-open-source-ai-with-new-llama-3-1)

If they're not even going to defend the one definition that they're supposed to be the stewards of, what do you think will happen when they have an additional, significantly looser, much more lax definition that is open to many more kinds of abuse?

I don't need to fast-forward to the next episode or the next season to predict what's going to happen:

* It will be bad for users in terms of what they can do with what they consider to be their own devices
* It will be bad for software developers and people who implement software
* It will result in *more* non-DFSG models being released, not less (since the creators of these models can now fall back to licenses which are completely non-free but still squeeze by on the OSAID definition)

PS To make Llama models OSAID-compliant Meta, in addition to (1)
    changing the model license, will also have to: (2) provide "a listing
    of all publicly available training data and where to obtain it", and
    (3) release under DFSG-compatible terms their entire training
    pipeline (currently unreleased). I don't think they will ever get
    there. But if they do, these would also be good things for the world.
    Not *as good* as having access to the entire training dataset, but
    good nonetheless.
Again, the OSAID doesn't particularly care about DFSG-compatible, so not sure where point 3 comes in here, but if there's something obvious I missed, I'm all ears.

-Jonathan

Reply via email to