Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators

Markus Armbruster Wed, 04 Jun 2025 05:48:54 -0700

Philippe Mathieu-Daudé <phi...@linaro.org> writes:

> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>> Stefan Hajnoczi <stefa...@gmail.com> writes:
>>>
>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <arm...@redhat.com> 
>>>> wrote:
>>>>>
>>>>> From: Daniel P. Berrangé <berra...@redhat.com>
>>>>> +The increasing prevalence of AI code generators, most notably but not 
>>>>> limited
>>>>
>>>> More detail is needed on what an "AI code generator" is. Coding
>>>> assistant tools range from autocompletion to linters to automatic code
>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>> API documentation summarizer.
>>>>
>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>> tool into QEMU.
>>>>
>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>> changes must not be generated by AI.
>> 
>> The scope of the policy is around contributions we receive as
>> patches with SoB. Researching / brainstorming / analysis etc
>> are not contribution activities, so not covered by the policy
>> IMHO.
>> 
>>>
>>> The existing text is about "AI code generators".  However, the "most
>>> notably LLMs" that follows it could lead readers to believe it's about
>>> more than just code generation, because LLMs are in fact used for more.
>>> I figure this is your concern.
>>>
>>> We could instead start wide, then narrow the focus to code generation.
>>> Here's my try:
>>>
>>>    The increasing prevalence of AI-assisted software development results
>>>    in a number of difficult legal questions and risks for software
>>>    projects, including QEMU.  Of particular concern is code generated by
>>>    `Large Language Models
>>>    <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>> 
>> Documentation we maintain has the same concerns as code.
>> So I'd suggest to substitute 'code' with 'code / content'.
>
> Why couldn't we accept documentation patches improved using LLM?
>
> As a non-native English speaker being often stuck trying to describe
> function APIs, I'm very tempted to use a LLM to review my sentences
> and make them better understandable.


I understand the temptation!  Unfortunately, the "legal questions and
risks" Daniel described apply to *any* kind of copyrightable material,
not just to code.

Quote:

    To satisfy the DCO, the patch contributor has to fully understand the
    copyright and license status of code they are contributing to QEMU. With AI
    code generators, the copyright and license status of the output is 
ill-defined
    with no generally accepted, settled legal foundation.

    Where the training material is known, it is common for it to include large
    volumes of material under restrictive licensing/copyright terms. Even where
    the training material is all known to be under open source licenses, it is
    likely to be under a variety of terms, not all of which will be compatible
    with QEMU's licensing requirements.

    How contributors could comply with DCO terms (b) or (c) for the output of AI
    code generators commonly available today is unclear.  The QEMU project is 
not
    willing or able to accept the legal risks of non-compliance.

[...]

Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators

Reply via email to