whatever) contributions to Gentoo

Matt Jolly Wed, 28 Feb 2024 03:06:59 -0800

But where do we draw the line? Are translation tools like DeepL allowed? I don't see much of a copyright issue for these.


I'd also like to jump in and play devil's advocate. There's a fair
chance that this is because I just got back from a
supercomputing/research conf where LLMs were the hot topic in every keynote.

As mentioned by Sam, this RFC is performative. Any users that are going
to abuse LLMs are going to do it _anyway_, regardless of the rules. We
already rely on common sense to filter these out; we're always going to
have BS/Spam PRs and bugs - I don't really think that the content being
generated by LLM is really any worse.

This doesn't mean that I think we should blanket allow poor quality LLM
contributions. It's especially important that we take into account the
potential for bias, factual errors, and outright plagarism when these
tools are used incorrectly.  We already have methods for weeding out low
quality contributions and bad faith contributors - let's trust in these
and see what we can do to strengthen these tools and processes.

A bit closer to home for me, what about using a LLMs as an assistive
technology / to reduce boilerplate? I'm recovering from RSI - I don't
know when (if...) I'll be able to type like I used to again. If a model
is able to infer some mostly salvagable boilerplate from its context
window I'm going to use it and spend the effort I would writing that to
fix something else; an outright ban on LLM use will reduce my _ability_
to contribute to the project.

What about using a LLM for code documentation? Some models can do a
passable job of writing decent quality function documentation and, in
production, I _have_ caught real issues in my logic this way. Why should
I type that out (and write what I think the code does rather than what
it actually does) if an LLM can get 'close enough' and I only need to do
light editing?

In line with the above, if the concern is about code quality / potential
for plagiarised code, What about indirect use of LLMs? Imagine a
hypothetical situation where a contributor asks a LLM to summarise a
topic and uses that knowledge to implement a feature. Is this now
tainted / forbidden knowledge according to the Gentoo project?

As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
and repos, or more likely trained on exclusively open-source
contributions and fine-tuned on Gentoo specifics? I'm in the process of
spinning up several models at work to get a handle on the tech / turn
more electricity into heat - this is a real possibility (if I can ever
find the time).

The cat is out of the bag when it comes to LLMs. In my real-world job I
talk to scientists and engineers using these things (for their
strengths) to quickly iterate on designs, to summarise experimental
results, and even to generate testable hypotheses. We're only going to
see increasing use of this technology going forward.

TL;DR: I think this is a bad idea. We already have effective mechanisms
for dealing with spam and bad faith contributions. Banning LLM use by
Gentoo contributors at this point is just throwing the baby out with the
bathwater.

As an alternative I'd be very happy some guidelines for the use of LLMs
and other assistive technologies like "Don't use LLM code snippets
unless you understand them", "Don't blindly copy and paste LLM output",
or, my personal favourite, "Don't be a jerk to our poor bug wranglers".

A blanket "No completely AI/LLM generated works" might be fine, too.

Let's see how the legal issues shake out before we start pre-emptively
banning useful tools. There's a lot of ongoing action in this space - at
the very least I'd like to see some thorough discussion of the legal
issues separately if we're making a case for banning an entire class of
technology.

A Gentoo LLM project formed of experts who could actually provide good
advice / some actual guidelines for LLM use within the project (and
engaging some real-world legal advice) might be a good starting point.
Are there any volunteers in the audience?

Thanks for listening to my TED talk,

Matt

OpenPGP_0x50EC548D52E051C0.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo

Reply via email to