Hi Aaron, On Tue, 2025-12-16 at 01:05 -0500, Aaron Merey wrote: > On Mon, Dec 15, 2025 at 12:25 PM Mark Wielaard <[email protected]> wrote: > > On Thu, 2025-12-11 at 23:35 -0500, Aaron Merey wrote: > > > I'd like to propose an elfutils policy for contributions containing > > > content generated by LLM or AI tools (AI-assisted contributions). A > > > written policy will help clarify for contributors whether elfutils > > > accepts AI-assisted contributions and whether any special procedures > > > apply. > > > > I think it would be good to differentiate between LLM generated > > contributions and [AI] tool assisted contributions. The first seems > > easy to define and is about whether or not to accept such generated > > patches. While the later seems a very broad topic that mostly is what > > tools a developer might use personally, most of which we don't need a > > policy for. > > That's a fair distinction. The policy can be reworded so that it > addresses contributions containing LLM-generated content (beyond > accessibility aids) instead of AI tooling in general.
I think that would lead to the most concise guidance to contributors. > > > There isn't a consensus across major open source projects on whether > > > AI-assisted contributions should be allowed. For example, Binutils > > > [1], Gentoo [2], and man-pages [3] have adopted policies rejecting > > > most or all AI-assisted contributions. > > > > There have also been discussions by glibc and gcc to adopt a similar > > policy as binutils has on LLM Generated Content. > > > > > Fedora [4] and the Linux Foundation [5] have policies permitting the > > > use of AI-assisted contributions. Contributors are expected to > > > disclose the use of any AI tools and take responsibility for the > > > contribution's quality and license compatibility. > > > > The Fedora one is for a large part not about using AI for code > > contributions. The Linux Foundation one lets each developer try to > > figure out if there are (legal) issues or not. Both feel like they are > > not really giving any real guidance but let every individual try to > > figure it out themselves. > > What stood out to me was that these policies do not unconditionally > ban contributions containing LLM content. This content may be > acceptable when there is disclosure, license compatibility, and > absence of incompatible third party content. And having the clear rights to sign off on the legal requirements of the project. Which I think is why these guidelines are not very practical. It pushes contributors to find some imaginary line where it is "still" OK to just copy LLM generated content. > > > In my opinion, elfutils should permit AI-assisted contributions. As > > > for specific policies, I suggest the following. > > > > > > (1) AI-assisted contributions should include a disclosure that some or > > > all of the contribution was generated using an AI tool. The git > > > commit tag "Assisted-by:" has been adopted for this purpose by Fedora, > > > for instance. > > > > I think this is too weak. The tag or comment should at least explain > > how to replicate the generated content. Which isn't very practical with > > the current generation of LLM chatbots. Or probably even impossible. I > > do think it is appropriate for deterministic tooling though, so as to > > have a recipe to replicate specific code changes. > > Reproduction steps for deterministic tools and prompts or conversation > summaries for LLMs are fine with me. The first are fine with me, the second not really. > I want to note that reproducibility isn't always required when we > accept a patch. Of course not all human-authored changes are based on > a process that's reproducible in practice and I don't think we need to > introduce this requirement just for LLM content. I like the idea of an Assisted-by tag, but only for tools that users, maintainers and reviewers can also actually use, and that come with exact instructions or a script that can be used to replicate the suggested changes. e.g. Assisted-by: emacs isn't very useful, but if you have an specific elisp script then please provide it (maybe just include it in the patch) so others can also use it. The issue with LLM generated content is that it is not practical/impossible to provide enough context for anyone else to recreate it. > > > (2) AI-assisted contributions should otherwise be treated like any > > > other contribution. The contributor vouches for the quality of their > > > contribution and verifies license compatibility with their DCO > > > "Signed-off-by:" tag while reviewers evaluate the technical merits of > > > the contribution. > > > > Yes, but I think this just says no such contributions can have a > > Signed-off-by tag since, at least for LLM chatbot like generated > > patches, have unclear copyright status and so a contributor cannot > > ChatGPT, for example, includes the following statement in its terms of use [1] > > "Ownership of content. As between you and OpenAI, and to the extent > permitted by applicable law, you (a) retain your ownership rights in > Input and (b) own the Output. We hereby assign to you all our right, > title, and interest, if any, in and to Output. ... Our assignment > above does not extend to other users’ output or any Third Party > Output." Right, that tells me they might not actually have any rights to grant you and they acknowledge there are other right holders who don't give you any rights. > If a contributor uses ChatGPT to help prepare a patch and takes > reasonable care to avoid including third party content, I think the > contributor can reasonably sign the DCO in this case. There is valid > disagreement about this of course. Projects such as QEMU [2] have > policies rejecting LLM content due to the uncertainty of DCO claims. > On the other hand, Chris Wright and Richard Fontana [3] argue that the > DCO can be compatible with LLM content. I think the QEMU example is what we should follow. I see the argument made in that Red Hat blog post, but I am not really convinced by their arguments, it feels they are handwaving away valid concerns about attribution and legal (copy)rights. But I do like and agree with: None of this is to say that projects must allow AI-assisted contributions. Each project is entitled to make its own rules and set its own comfort level, and if a project decides to prohibit AI-assisted contributions for now, that decision deserves respect. I think they do bring up an important point of establishing trust. And that there are trust issues not just legally or technically, but also ethically. They claim you shouldn't stigmatize contributors that "try to use AI responsibly". But there is a genuine question if that is even possible when there are legal/ethical issues around the processing of training data and when there are even LLMs that are explicitly trained to act like white supremacists and attacking marginalized groups. Then there is the economic, energy costs and climate impact. There is a real trust issue here imho with the current generation of LLMs. Maybe one day there will be something like https://sfconservancy.org/activities/aspirational-statement-on-llm-generative-ai-for-programming.html Then we can maybe reexamine the trust issue. > > I would lean the other way and adopt a simple policy like the rest of > > the core toolchain projects are adopting to reject LLM generated > > contributions for which the provenance cannot be determined (because > > the training corpus and/or algorithm is unknown). > > These provenance concerns are fair, but can they be accommodated by > our existing practices? I think it can. We should provide guidance that contributors should not sign off on any (non-trivial) LLM generated code/docs. Just like you wouldn't sign off on code/docs you "find" somewhere without clear attribution, copyright and license terms. We can reuse some of the guidance given by the binutils and/or qemu project to make that clear. Cheers, Mark > [1] https://openai.com/policies/row-terms-of-use/ > [2] > https://www.qemu.org/docs/master/devel/code-provenance.html#use-of-ai-content-generators > [3] > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
