Re: Policy proposal for AI-assisted contributions

Mark Wielaard Tue, 16 Dec 2025 10:30:05 -0800

Hi Aaron,

On Tue, 2025-12-16 at 01:05 -0500, Aaron Merey wrote:
> On Mon, Dec 15, 2025 at 12:25 PM Mark Wielaard <[email protected]> wrote:
> > On Thu, 2025-12-11 at 23:35 -0500, Aaron Merey wrote:
> > > I'd like to propose an elfutils policy for contributions containing
> > > content generated by LLM or AI tools (AI-assisted contributions).  A
> > > written policy will help clarify for contributors whether elfutils
> > > accepts AI-assisted contributions and whether any special procedures
> > > apply.
> > 
> > I think it would be good to differentiate between LLM generated
> > contributions and [AI] tool assisted contributions. The first seems
> > easy to define and is about whether or not to accept such generated
> > patches. While the later seems a very broad topic that mostly is what
> > tools a developer might use personally, most of which we don't need a
> > policy for.
> 
> That's a fair distinction.  The policy can be reworded so that it
> addresses contributions containing LLM-generated content (beyond
> accessibility aids) instead of AI tooling in general.


I think that would lead to the most concise guidance to contributors.

> > > There isn't a consensus across major open source projects on whether
> > > AI-assisted contributions should be allowed.  For example, Binutils
> > > [1], Gentoo [2], and man-pages [3] have adopted policies rejecting
> > > most or all AI-assisted contributions.
> > 
> > There have also been discussions by glibc and gcc to adopt a similar
> > policy as binutils has on LLM Generated Content.
> > 
> > > Fedora [4] and the Linux Foundation [5] have policies permitting the
> > > use of AI-assisted contributions.  Contributors are expected to
> > > disclose the use of any AI tools and take responsibility for the
> > > contribution's quality and license compatibility.
> > 
> > The Fedora one is for a large part not about using AI for code
> > contributions. The Linux Foundation one lets each developer try to
> > figure out if there are (legal) issues or not. Both feel like they are
> > not really giving any real guidance but let every individual try to
> > figure it out themselves.
> 
> What stood out to me was that these policies do not unconditionally
> ban contributions containing LLM content.  This content may be
> acceptable when there is disclosure, license compatibility, and
> absence of incompatible third party content.

And having the clear rights to sign off on the legal requirements of
the project. Which I think is why these guidelines are not very
practical. It pushes contributors to find some imaginary line where it
is "still" OK to just copy LLM generated content.

> > > In my opinion, elfutils should permit AI-assisted contributions.  As
> > > for specific policies, I suggest the following.
> > > 
> > > (1) AI-assisted contributions should include a disclosure that some or
> > > all of the contribution was generated using an AI tool.  The git
> > > commit tag "Assisted-by:" has been adopted for this purpose by Fedora,
> > > for instance.
> > 
> > I think this is too weak. The tag or comment should at least explain
> > how to replicate the generated content. Which isn't very practical with
> > the current generation of LLM chatbots. Or probably even impossible. I
> > do think it is appropriate for deterministic tooling though, so as to
> > have a recipe to replicate specific code changes.
> 
> Reproduction steps for deterministic tools and prompts or conversation
> summaries for LLMs are fine with me.

The first are fine with me, the second not really.

> I want to note that reproducibility isn't always required when we
> accept a patch. Of course not all human-authored changes are based on
> a process that's reproducible in practice and I don't think we need to
> introduce this requirement just for LLM content.

I like the idea of an Assisted-by tag, but only for tools that users,
maintainers and reviewers can also actually use, and that come with
exact instructions or a script that can be used to replicate the
suggested changes. e.g. Assisted-by: emacs isn't very useful, but if
you have an specific elisp script then please provide it (maybe just
include it in the patch) so others can also use it.

The issue with LLM generated content is that it is not
practical/impossible to provide enough context for anyone else to
recreate it.

> > > (2) AI-assisted contributions should otherwise be treated like any
> > > other contribution.  The contributor vouches for the quality of their
> > > contribution and verifies license compatibility with their DCO
> > > "Signed-off-by:" tag while reviewers evaluate the technical merits of
> > > the contribution.
> > 
> > Yes, but I think this just says no such contributions can have a
> > Signed-off-by tag since, at least for LLM chatbot like generated
> > patches, have unclear copyright status and so a contributor cannot
> 
> ChatGPT, for example, includes the following statement in its terms of use [1]
> 
> "Ownership of content. As between you and OpenAI, and to the extent
> permitted by applicable law, you (a) retain your ownership rights in
> Input and (b) own the Output. We hereby assign to you all our right,
> title, and interest, if any, in and to Output. ... Our assignment
> above does not extend to other users’ output or any Third Party
> Output."

Right, that tells me they might not actually have any rights to grant
you and they acknowledge there are other right holders who don't give
you any rights.

> If a contributor uses ChatGPT to help prepare a patch and takes
> reasonable care to avoid including third party content, I think the
> contributor can reasonably sign the DCO in this case.  There is valid
> disagreement about this of course.  Projects such as QEMU [2] have
> policies rejecting LLM content due to the uncertainty of DCO claims.
> On the other hand, Chris Wright and Richard Fontana [3] argue that the
> DCO can be compatible with LLM content.

I think the QEMU example is what we should follow. I see the argument
made in that Red Hat blog post, but I am not really convinced by their
arguments, it feels they are handwaving away valid concerns about
attribution and legal (copy)rights. But I do like and agree with:

   None of this is to say that projects must allow AI-assisted
   contributions. Each project is entitled to make its own rules and set
   its own comfort level, and if a project decides to prohibit AI-assisted
   contributions for now, that decision deserves respect.
   
I think they do bring up an important point of establishing trust. And
that there are trust issues not just legally or technically, but also
ethically. They claim you shouldn't stigmatize contributors that "try
to use AI responsibly". But there is a genuine question if that is even
possible when there are legal/ethical issues around the processing of
training data and when there are even LLMs that are explicitly trained
to act like white supremacists and attacking marginalized groups. Then
there is the economic, energy costs and climate impact. There is a real
trust issue here imho with the current generation of LLMs. Maybe one
day there will be something like
https://sfconservancy.org/activities/aspirational-statement-on-llm-generative-ai-for-programming.html
Then we can maybe reexamine the trust issue.

> > I would lean the other way and adopt a simple policy like the rest of
> > the core toolchain projects are adopting to reject LLM generated
> > contributions for which the provenance cannot be determined (because
> > the training corpus and/or algorithm is unknown).
> 
> These provenance concerns are fair, but can they be accommodated by
> our existing practices?

I think it can. We should provide guidance that contributors should not
sign off on any (non-trivial) LLM generated code/docs. Just like you
wouldn't sign off on code/docs you "find" somewhere without clear
attribution, copyright and license terms. We can reuse some of the
guidance given by the binutils and/or qemu project to make that clear.

Cheers,

Mark

> [1] https://openai.com/policies/row-terms-of-use/
> [2] 
> https://www.qemu.org/docs/master/devel/code-provenance.html#use-of-ai-content-generators
> [3] 
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues

Re: Policy proposal for AI-assisted contributions

Reply via email to