Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Junwang Zhao Sun, 01 Feb 2026 19:08:34 -0800

Hi folks,

On Fri, Jan 30, 2026 at 9:47 AM Russell Spitzer
<[email protected]> wrote:
>
> Thanks for the link Anurag, that looks like a good baseline for us as well.
>
> On Thu, Jan 29, 2026 at 4:47 PM Anurag Mantripragada 
> <[email protected]> wrote:
>>
>> +1 to adding guidelines for AI-generated contributions.
>>
>> We have been discussing this topic with other project maintainers, such as 
>> those from DataFusion. Protecting reviewer time is a primary challenge, as 
>> it is a scarce resource. It is also helpful to see how other projects are 
>> addressing this. Thank you for sharing the existing links. I would like to 
>> add a relevant discussion from the KvRocks project [1] to the list, as I 
>> found it very useful.
>>
>> [1] - https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6 
>> (https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6)
>>
>> ~ Anurag
>>
>> On Wed, Jan 28, 2026 at 9:32 PM Kevin Liu <[email protected]> wrote:
>>>
>>>
>>> Thanks, Junwang, for starting this discussion.
>>>
>>> I think it's great to have project-level recommendations and guidelines in 
>>> place regarding AI-generated contributions. Personally, I welcome 
>>> AI-generated code and reviews. When used correctly, I believe it can be a 
>>> force multiplier.
>>> However, I want to echo Gang's comment, as I've had similar experiences as 
>>> a reviewer. Since code and PRs are now easier to generate, the burden often 
>>> falls on the reviewer to read and comprehend them. I would like to be able 
>>> to link to the guidelines to more easily set expectations.
>>>
>>> I think a great place to host this would be 
>>> https://iceberg.apache.org/contribute/, and we can reference it in the 
>>> subprojects.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>> On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev 
>>> <[email protected]> wrote:
>>>>
>>>> I’d like to +1 on the importance of having PR titles + descriptions match 
>>>> the existing human-written PRs.
>>>>
>>>> It can be challenging to detect AI-written code versus human-written code. 
>>>> It is often trivial to detect AI-written issues and PR descriptions. 
>>>> AI-generated text descriptions are often verbose, incorrect, and just time 
>>>> consuming to read.
>>>>
>>>> I think that ensuring that the issues and PR text remain easily consumable 
>>>> will help new contributors enter the project and help reviewers review PRs.
>>>>
>>>> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> wrote:
>>>>>
>>>>> Thanks for starting this discussion. I'd suggest embracing the changes.
>>>>> If there are guidelines that AI tools need to follow, we'd better 
>>>>> formalize them in READMEs like skills[1] for code generation and review.
>>>>> I feel we are entering into the age where a Claude code agent submitting 
>>>>> a PR and a GitHub copilot reviewing them.
>>>>> BTW, I have been using copilot to review PRs and the result is pretty 
>>>>> good for the first pass of PR.
>>>>>
>>>>>
>>>>> 1. https://code.claude.com/docs/en/skills
>>>>>
>>>>> Regards,
>>>>> Manu
>>>>>
>>>>>
>>>>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote:
>>>>>>
>>>>>> Thanks Junwang for raising this! I strongly agree with this proposal.
>>>>>>
>>>>>> This aligns perfectly with some common issues I've recently
>>>>>> encountered in different projects. We have indeed observed a trend
>>>>>> where individuals, who lack a deep understanding of Iceberg, are
>>>>>> starting to use AI to generate PRs. This AI-produced code often looks
>>>>>> correct on the surface but contains numerous hidden issues.
>>>>>>
>>>>>> For iceberg-cpp, which has limited reviewer resources, processing
>>>>>> these low-quality PRs consumes a significant amount of valuable time
>>>>>> and effort.
>>>>>>
>>>>>> Therefore, a clear guidance document is crucial. It would effectively
>>>>>> communicate the project's expectations regarding PR quality and
>>>>>> ownership to contributors. If a contributor simply dumps a low-effort
>>>>>> PR that lacks the author's deep understanding and debugging
>>>>>> capability, the document would set the expectation that it is unlikely
>>>>>> to be reviewed by maintainers, thus preventing unnecessary maintenance
>>>>>> burden.
>>>>>>
>>>>>> Best,
>>>>>> Gang
>>>>>>
>>>>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote:
>>>>>> >
>>>>>> > Hi folks,
>>>>>> >
>>>>>> > I'd like to start a discussion on whether we should add a page to the
>>>>>> > Iceberg documentation describing expectations around AI-generated
>>>>>> > contributions.
>>>>>> >
>>>>>> > This topic has recently been discussed on the Arrow dev mailing
>>>>>> > list[1]. In addition, the iceberg-cpp project has already taken a step
>>>>>> > in this direction by introducing AI-related contribution
>>>>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
>>>>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
>>>>>> > topic more broadly within the Iceberg community.
>>>>>> >
>>>>>> > The ASF already provides high-level guidance on the use of generative
>>>>>> > AI tools, primarily focused on licensing and IP considerations[3]. As
>>>>>> > AI-assisted development and so-called "vibe coding" become more
>>>>>> > common, thoughtful use of these tools can be beneficial; however, if
>>>>>> > the contributing author appears not to have engaged deeply with the
>>>>>> > code and/or cannot respond to review feedback, this can significantly
>>>>>> > increase maintainer burden and make the review process less
>>>>>> > collaborative.
>>>>>> >
>>>>>> > Having documented guidelines would give maintainers a clear reference
>>>>>> > point when evaluating such contributions (including when deciding to
>>>>>> > close a PR), and would also make it easier to assess whether a
>>>>>> > contributor has made a reasonable effort to meet project expectations.
>>>>>> >
>>>>>> > I've pulled together some guidelines from iceberg-cpp's PR and
>>>>>> > discussions on the Arrow dev ML, hoping to kick off a broader
>>>>>> > conversation about what should go into Iceberg's AI-generated
>>>>>> > contribution guidelines.
>>>>>> >
>>>>>> > -----
>>>>>> >
>>>>>> > We are not opposed to the use of AI tools in generating PRs, but we
>>>>>> > recommend that contributors adhere to the following principles:
>>>>>> >
>>>>>> > - The PR author should **understand the core ideas** behind the
>>>>>> > implementation **end-to-end**, and be able to justify the design and
>>>>>> > code during review.
>>>>>> > - **Calls out unknowns and assumptions**. It's okay to not fully
>>>>>> > understand some bits of AI generated code. You should comment on these
>>>>>> > cases and point them out to reviewers so that they can use their
>>>>>> > knowledge of the codebase to clear up any concerns. For example, you
>>>>>> > might comment "calling this function here seems to work but I'm not
>>>>>> > familiar with how it works internally, I wonder if there's a race
>>>>>> > condition if it is called concurrently".
>>>>>> > - Only submit a PR if you are able to debug, explain, and take
>>>>>> > ownership of the changes.
>>>>>> > - Ensure the PR title and description match the style, level of
>>>>>> > detail, and tone of other Iceberg PRs.
>>>>>> > - Follow coding conventions used in the rest of the codebase.
>>>>>> > - Be upfront about AI usage, including a brief summary of which parts
>>>>>> > were AI-generated.
>>>>>> > - Reference any sources that guided your changes (e.g. "took a similar
>>>>>> > approach to #XXXX").
>>>>>> >
>>>>>> > -----
>>>>>> >
>>>>>> > Looking forward to hearing your thoughts.
>>>>>> >
>>>>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
>>>>>> > [2] https://github.com/apache/iceberg-cpp/pull/531
>>>>>> > [3] https://www.apache.org/legal/generative-tooling.html
>>>>>> >
>>>>>> > --
>>>>>> > Regards
>>>>>> > Junwang Zhao


Thanks for all the feedback. I've opened a PR[1] to add a new
`Guidelines for AI-assisted Contributions` section to
https://iceberg.apache.org/contribute/ (as suggested by Kevin).

The draft references prior discussions in the iceberg-cpp [2] and
Arrow dev ml [3], as well as the Kvrocks documentation on a similar
topic [4].

Please take a look and let me know whether this is adequate, any
comments would be greatly appreciated.

[1] https://github.com/apache/iceberg/pull/15213
[2] https://github.com/apache/iceberg-cpp/pull/531
[3] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
[4] 
https://kvrocks.apache.org/community/contributing/#guidelines-for-ai-assisted-contributions

-- 
Regards
Junwang Zhao

Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Reply via email to