Hi Stefano
On 2024/10/29 13:03, Stefano Zacchiroli wrote:
On Mon, Oct 28, 2024 at 09:53:31PM +0200, Jonathan Carter wrote:
The companies [...] want to restrict what you can actually use it
for, and call it open source? And then OSI makes a definition that
seems carefully crafted to let these kind of licenses slip through?
The licensing terms for the Meta Llama models are indeed horrific, but I
don't understand your point here. In order to be OSAID compliant, Meta
will precisely have to change those licensing terms and make them
DFSG-compliant. That would be a *good* thing for the world and would fix
the main thing you are upset about.
Unfortunately that's not the case. Meta won't have to make Llama3 DFSG
compliant in order to be OSAID compliant, since OSAID as not as robust
as the OSD.
The OSAID has no provisions for explicit free (re)distribution as the
OSD#1 has, so Meta could continue to require license fees over a certain
user count and still claim that they are OSAID compliant. They could
also keep their clause about Llama3 being licensed under a
non-transferable license (OSD#7), which as far as I understand, is there
to prevents forks from happening, and this too would be OSAID compliant.
LLama3's license is particular dodgy, but it's not unique in the AI
space, and I can assure you that if someone out there is even convinced
to do the minimum that OSAID requires, it might still be a far shot from
DFSG free, and for that reason we as Debian should absolutely not
endorse it imho.
And Meta is not liking that idea. Meta is, right now, lobbying EU
regulators to convince them that what should count as "open source AI"
for the purposes of the EU AI Act is their (Meta's) version, rather than
OSAID.
I have personally fought (and lost) during the OSAID definition process
to make access to training data mandatory in the definition. So while
I'm certainly not against criticizing OSAID, we should do that for the
right reasons.
What is the OSI's motivation for creating such an incredibly lax
definition for open source AI? Meta is already calling their
absolutely-not-open-source model Open Source and promoting it as such,
without as much as a *peep* from the OSI condemning the abuse of the
term. (although, while doing a quick search to make sure that's true, I
found this link from OSI to an article that keeps insisting that LLama3
is open source:
https://opensource.org/press-mentions/meta-inches-toward-open-source-ai-with-new-llama-3-1)
If they're not even going to defend the one definition that they're
supposed to be the stewards of, what do you think will happen when they
have an additional, significantly looser, much more lax definition that
is open to many more kinds of abuse?
I don't need to fast-forward to the next episode or the next season to
predict what's going to happen:
* It will be bad for users in terms of what they can do with what they
consider to be their own devices
* It will be bad for software developers and people who implement software
* It will result in *more* non-DFSG models being released, not less
(since the creators of these models can now fall back to licenses which
are completely non-free but still squeeze by on the OSAID definition)
PS To make Llama models OSAID-compliant Meta, in addition to (1)
changing the model license, will also have to: (2) provide "a listing
of all publicly available training data and where to obtain it", and
(3) release under DFSG-compatible terms their entire training
pipeline (currently unreleased). I don't think they will ever get
there. But if they do, these would also be good things for the world.
Not *as good* as having access to the entire training dataset, but
good nonetheless.
Again, the OSAID doesn't particularly care about DFSG-compatible, so not
sure where point 3 comes in here, but if there's something obvious I
missed, I'm all ears.
-Jonathan