Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Kevin Liu Wed, 28 Jan 2026 21:32:00 -0800

Thanks, Junwang, for starting this discussion.

I think it's great to have project-level recommendations and guidelines in
place regarding AI-generated contributions. Personally, I welcome
AI-generated code and reviews. When used correctly, I believe it can be a
force multiplier.
However, I want to echo Gang's comment, as I've had similar experiences as
a reviewer. Since code and PRs are now easier to generate, the burden often
falls on the reviewer to read and comprehend them. I would like to be able
to link to the guidelines to more easily set expectations.


I think a great place to host this would be
https://iceberg.apache.org/contribute/, and we can reference it in the
subprojects.

Best,
Kevin Liu

On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev <[email protected]>
wrote:

> I’d like to +1 on the importance of having PR titles + descriptions match
> the existing human-written PRs.
>
> It can be challenging to detect AI-written code versus human-written code.
> It is often trivial to detect AI-written issues and PR descriptions.
> AI-generated text descriptions are often verbose, incorrect, and just time
> consuming to read.
>
> I think that ensuring that the issues and PR text remain easily consumable
> will help new contributors enter the project and help reviewers review PRs.
>
> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]>
> wrote:
>
>> Thanks for starting this discussion. I'd suggest embracing the changes.
>> If there are guidelines that AI tools need to follow, we'd better
>> formalize them in READMEs like skills[1] for code generation and review.
>> I feel we are entering into the age where a Claude code agent submitting
>> a PR and a GitHub copilot reviewing them.
>> BTW, I have been using copilot to review PRs and the result is pretty
>> good for the first pass of PR.
>>
>>
>> 1. https://code.claude.com/docs/en/skills
>>
>> Regards,
>> Manu
>>
>>
>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote:
>>
>>> Thanks Junwang for raising this! I strongly agree with this proposal.
>>>
>>> This aligns perfectly with some common issues I've recently
>>> encountered in different projects. We have indeed observed a trend
>>> where individuals, who lack a deep understanding of Iceberg, are
>>> starting to use AI to generate PRs. This AI-produced code often looks
>>> correct on the surface but contains numerous hidden issues.
>>>
>>> For iceberg-cpp, which has limited reviewer resources, processing
>>> these low-quality PRs consumes a significant amount of valuable time
>>> and effort.
>>>
>>> Therefore, a clear guidance document is crucial. It would effectively
>>> communicate the project's expectations regarding PR quality and
>>> ownership to contributors. If a contributor simply dumps a low-effort
>>> PR that lacks the author's deep understanding and debugging
>>> capability, the document would set the expectation that it is unlikely
>>> to be reviewed by maintainers, thus preventing unnecessary maintenance
>>> burden.
>>>
>>> Best,
>>> Gang
>>>
>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > I'd like to start a discussion on whether we should add a page to the
>>> > Iceberg documentation describing expectations around AI-generated
>>> > contributions.
>>> >
>>> > This topic has recently been discussed on the Arrow dev mailing
>>> > list[1]. In addition, the iceberg-cpp project has already taken a step
>>> > in this direction by introducing AI-related contribution
>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
>>> > topic more broadly within the Iceberg community.
>>> >
>>> > The ASF already provides high-level guidance on the use of generative
>>> > AI tools, primarily focused on licensing and IP considerations[3]. As
>>> > AI-assisted development and so-called "vibe coding" become more
>>> > common, thoughtful use of these tools can be beneficial; however, if
>>> > the contributing author appears not to have engaged deeply with the
>>> > code and/or cannot respond to review feedback, this can significantly
>>> > increase maintainer burden and make the review process less
>>> > collaborative.
>>> >
>>> > Having documented guidelines would give maintainers a clear reference
>>> > point when evaluating such contributions (including when deciding to
>>> > close a PR), and would also make it easier to assess whether a
>>> > contributor has made a reasonable effort to meet project expectations.
>>> >
>>> > I've pulled together some guidelines from iceberg-cpp's PR and
>>> > discussions on the Arrow dev ML, hoping to kick off a broader
>>> > conversation about what should go into Iceberg's AI-generated
>>> > contribution guidelines.
>>> >
>>> > -----
>>> >
>>> > We are not opposed to the use of AI tools in generating PRs, but we
>>> > recommend that contributors adhere to the following principles:
>>> >
>>> > - The PR author should **understand the core ideas** behind the
>>> > implementation **end-to-end**, and be able to justify the design and
>>> > code during review.
>>> > - **Calls out unknowns and assumptions**. It's okay to not fully
>>> > understand some bits of AI generated code. You should comment on these
>>> > cases and point them out to reviewers so that they can use their
>>> > knowledge of the codebase to clear up any concerns. For example, you
>>> > might comment "calling this function here seems to work but I'm not
>>> > familiar with how it works internally, I wonder if there's a race
>>> > condition if it is called concurrently".
>>> > - Only submit a PR if you are able to debug, explain, and take
>>> > ownership of the changes.
>>> > - Ensure the PR title and description match the style, level of
>>> > detail, and tone of other Iceberg PRs.
>>> > - Follow coding conventions used in the rest of the codebase.
>>> > - Be upfront about AI usage, including a brief summary of which parts
>>> > were AI-generated.
>>> > - Reference any sources that guided your changes (e.g. "took a similar
>>> > approach to #XXXX").
>>> >
>>> > -----
>>> >
>>> > Looking forward to hearing your thoughts.
>>> >
>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
>>> > [2] https://github.com/apache/iceberg-cpp/pull/531
>>> > [3] https://www.apache.org/legal/generative-tooling.html
>>> >
>>> > --
>>> > Regards
>>> > Junwang Zhao
>>>
>>

Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Reply via email to