Re: A Different Take on AI

Stefano Zacchiroli Fri, 07 Feb 2025 07:04:58 -0800

While I'm still digesting the very impactful (for me) message by the
other Sam (hartmans), a quick but important note on the following:

On Fri, Feb 07, 2025 at 01:35:00PM +0100, Sam Johnston wrote:
> "Large language models (LMs) have been shown to memorize parts of
> their training data, and when prompted appropriately, they will emit
> the memorized training data verbatim."

I don't think we should focus our conversation on LLMs much, if at all.
The reason is that, even if a completely free-as-in-freedom (including
in its training dataset), high quality LLM were to materialize in the
future, its preferred form of modification (which includes the dataset)
will be practically impossible to distribute by Debian due to its size.

So when we think of concrete examples, let's focus on what could be
reasonably distributed by Debian. This includes small(er) generative AI
language models, but also all sorts of *non-generative* AI models, e.g.,
classification models. The latter do not generate copyrightable content,
so most of the issues you pointed out do not apply to them. Other issues
still apply to them, including biases analyses (at a scale which *is*
manageable, addressing some of the issues pointed out by hartmans), and
ethical data sourcing.

Cheers
-- 
Stefano Zacchiroli . z...@upsilon.cc . https://upsilon.cc/zack  _. ^ ._
Full professor of Computer Science              o     o   o     \/|V|\/
Télécom Paris, Polytechnic Institute of Paris     o     o o    </>   <\>
Co-founder & CSO Software Heritage            o o o     o       /\|^|/\
Mastodon: https://mastodon.xyz/@zacchiro                        '" V "'

Re: A Different Take on AI

Reply via email to