Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Russell Spitzer Thu, 29 Jan 2026 17:47:47 -0800

Thanks for the link Anurag, that looks like a good baseline for us as well.


On Thu, Jan 29, 2026 at 4:47 PM Anurag Mantripragada <
[email protected]> wrote:

> +1 to adding guidelines for AI-generated contributions.
>
> We have been discussing this topic with other project maintainers, such as
> those from DataFusion. Protecting reviewer time is a primary challenge, as
> it is a scarce resource. It is also helpful to see how other projects are
> addressing this. Thank you for sharing the existing links. I would like to
> add a relevant discussion from the KvRocks project [1] to the list, as I
> found it very useful.
>
> [1] - https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6 (
> https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6)
>
> ~ Anurag
>
> On Wed, Jan 28, 2026 at 9:32 PM Kevin Liu <[email protected]> wrote:
>
>>
>> Thanks, Junwang, for starting this discussion.
>>
>> I think it's great to have project-level recommendations and guidelines
>> in place regarding AI-generated contributions. Personally, I welcome
>> AI-generated code and reviews. When used correctly, I believe it can be a
>> force multiplier.
>> However, I want to echo Gang's comment, as I've had similar experiences
>> as a reviewer. Since code and PRs are now easier to generate, the burden
>> often falls on the reviewer to read and comprehend them. I would like to be
>> able to link to the guidelines to more easily set expectations.
>>
>> I think a great place to host this would be
>> https://iceberg.apache.org/contribute/, and we can reference it in the
>> subprojects.
>>
>> Best,
>> Kevin Liu
>>
>> On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev <
>> [email protected]> wrote:
>>
>>> I’d like to +1 on the importance of having PR titles + descriptions
>>> match the existing human-written PRs.
>>>
>>> It can be challenging to detect AI-written code versus human-written
>>> code. It is often trivial to detect AI-written issues and PR descriptions.
>>> AI-generated text descriptions are often verbose, incorrect, and just time
>>> consuming to read.
>>>
>>> I think that ensuring that the issues and PR text remain easily
>>> consumable will help new contributors enter the project and help reviewers
>>> review PRs.
>>>
>>> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]>
>>> wrote:
>>>
>>>> Thanks for starting this discussion. I'd suggest embracing the changes.
>>>> If there are guidelines that AI tools need to follow, we'd better
>>>> formalize them in READMEs like skills[1] for code generation and review.
>>>> I feel we are entering into the age where a Claude code agent
>>>> submitting a PR and a GitHub copilot reviewing them.
>>>> BTW, I have been using copilot to review PRs and the result is pretty
>>>> good for the first pass of PR.
>>>>
>>>>
>>>> 1. https://code.claude.com/docs/en/skills
>>>>
>>>> Regards,
>>>> Manu
>>>>
>>>>
>>>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote:
>>>>
>>>>> Thanks Junwang for raising this! I strongly agree with this proposal.
>>>>>
>>>>> This aligns perfectly with some common issues I've recently
>>>>> encountered in different projects. We have indeed observed a trend
>>>>> where individuals, who lack a deep understanding of Iceberg, are
>>>>> starting to use AI to generate PRs. This AI-produced code often looks
>>>>> correct on the surface but contains numerous hidden issues.
>>>>>
>>>>> For iceberg-cpp, which has limited reviewer resources, processing
>>>>> these low-quality PRs consumes a significant amount of valuable time
>>>>> and effort.
>>>>>
>>>>> Therefore, a clear guidance document is crucial. It would effectively
>>>>> communicate the project's expectations regarding PR quality and
>>>>> ownership to contributors. If a contributor simply dumps a low-effort
>>>>> PR that lacks the author's deep understanding and debugging
>>>>> capability, the document would set the expectation that it is unlikely
>>>>> to be reviewed by maintainers, thus preventing unnecessary maintenance
>>>>> burden.
>>>>>
>>>>> Best,
>>>>> Gang
>>>>>
>>>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Hi folks,
>>>>> >
>>>>> > I'd like to start a discussion on whether we should add a page to the
>>>>> > Iceberg documentation describing expectations around AI-generated
>>>>> > contributions.
>>>>> >
>>>>> > This topic has recently been discussed on the Arrow dev mailing
>>>>> > list[1]. In addition, the iceberg-cpp project has already taken a
>>>>> step
>>>>> > in this direction by introducing AI-related contribution
>>>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
>>>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
>>>>> > topic more broadly within the Iceberg community.
>>>>> >
>>>>> > The ASF already provides high-level guidance on the use of generative
>>>>> > AI tools, primarily focused on licensing and IP considerations[3]. As
>>>>> > AI-assisted development and so-called "vibe coding" become more
>>>>> > common, thoughtful use of these tools can be beneficial; however, if
>>>>> > the contributing author appears not to have engaged deeply with the
>>>>> > code and/or cannot respond to review feedback, this can significantly
>>>>> > increase maintainer burden and make the review process less
>>>>> > collaborative.
>>>>> >
>>>>> > Having documented guidelines would give maintainers a clear reference
>>>>> > point when evaluating such contributions (including when deciding to
>>>>> > close a PR), and would also make it easier to assess whether a
>>>>> > contributor has made a reasonable effort to meet project
>>>>> expectations.
>>>>> >
>>>>> > I've pulled together some guidelines from iceberg-cpp's PR and
>>>>> > discussions on the Arrow dev ML, hoping to kick off a broader
>>>>> > conversation about what should go into Iceberg's AI-generated
>>>>> > contribution guidelines.
>>>>> >
>>>>> > -----
>>>>> >
>>>>> > We are not opposed to the use of AI tools in generating PRs, but we
>>>>> > recommend that contributors adhere to the following principles:
>>>>> >
>>>>> > - The PR author should **understand the core ideas** behind the
>>>>> > implementation **end-to-end**, and be able to justify the design and
>>>>> > code during review.
>>>>> > - **Calls out unknowns and assumptions**. It's okay to not fully
>>>>> > understand some bits of AI generated code. You should comment on
>>>>> these
>>>>> > cases and point them out to reviewers so that they can use their
>>>>> > knowledge of the codebase to clear up any concerns. For example, you
>>>>> > might comment "calling this function here seems to work but I'm not
>>>>> > familiar with how it works internally, I wonder if there's a race
>>>>> > condition if it is called concurrently".
>>>>> > - Only submit a PR if you are able to debug, explain, and take
>>>>> > ownership of the changes.
>>>>> > - Ensure the PR title and description match the style, level of
>>>>> > detail, and tone of other Iceberg PRs.
>>>>> > - Follow coding conventions used in the rest of the codebase.
>>>>> > - Be upfront about AI usage, including a brief summary of which parts
>>>>> > were AI-generated.
>>>>> > - Reference any sources that guided your changes (e.g. "took a
>>>>> similar
>>>>> > approach to #XXXX").
>>>>> >
>>>>> > -----
>>>>> >
>>>>> > Looking forward to hearing your thoughts.
>>>>> >
>>>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
>>>>> > [2] https://github.com/apache/iceberg-cpp/pull/531
>>>>> > [3] https://www.apache.org/legal/generative-tooling.html
>>>>> >
>>>>> > --
>>>>> > Regards
>>>>> > Junwang Zhao
>>>>>
>>>>

Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Reply via email to