+1 to adding guidelines for AI-generated contributions. We have been discussing this topic with other project maintainers, such as those from DataFusion. Protecting reviewer time is a primary challenge, as it is a scarce resource. It is also helpful to see how other projects are addressing this. Thank you for sharing the existing links. I would like to add a relevant discussion from the KvRocks project [1] to the list, as I found it very useful.
[1] - https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6 ( https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6) ~ Anurag On Wed, Jan 28, 2026 at 9:32 PM Kevin Liu <[email protected]> wrote: > > Thanks, Junwang, for starting this discussion. > > I think it's great to have project-level recommendations and guidelines in > place regarding AI-generated contributions. Personally, I welcome > AI-generated code and reviews. When used correctly, I believe it can be a > force multiplier. > However, I want to echo Gang's comment, as I've had similar experiences as > a reviewer. Since code and PRs are now easier to generate, the burden often > falls on the reviewer to read and comprehend them. I would like to be able > to link to the guidelines to more easily set expectations. > > I think a great place to host this would be > https://iceberg.apache.org/contribute/, and we can reference it in the > subprojects. > > Best, > Kevin Liu > > On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev < > [email protected]> wrote: > >> I’d like to +1 on the importance of having PR titles + descriptions match >> the existing human-written PRs. >> >> It can be challenging to detect AI-written code versus human-written >> code. It is often trivial to detect AI-written issues and PR descriptions. >> AI-generated text descriptions are often verbose, incorrect, and just time >> consuming to read. >> >> I think that ensuring that the issues and PR text remain easily >> consumable will help new contributors enter the project and help reviewers >> review PRs. >> >> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> >> wrote: >> >>> Thanks for starting this discussion. I'd suggest embracing the changes. >>> If there are guidelines that AI tools need to follow, we'd better >>> formalize them in READMEs like skills[1] for code generation and review. >>> I feel we are entering into the age where a Claude code agent submitting >>> a PR and a GitHub copilot reviewing them. >>> BTW, I have been using copilot to review PRs and the result is pretty >>> good for the first pass of PR. >>> >>> >>> 1. https://code.claude.com/docs/en/skills >>> >>> Regards, >>> Manu >>> >>> >>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote: >>> >>>> Thanks Junwang for raising this! I strongly agree with this proposal. >>>> >>>> This aligns perfectly with some common issues I've recently >>>> encountered in different projects. We have indeed observed a trend >>>> where individuals, who lack a deep understanding of Iceberg, are >>>> starting to use AI to generate PRs. This AI-produced code often looks >>>> correct on the surface but contains numerous hidden issues. >>>> >>>> For iceberg-cpp, which has limited reviewer resources, processing >>>> these low-quality PRs consumes a significant amount of valuable time >>>> and effort. >>>> >>>> Therefore, a clear guidance document is crucial. It would effectively >>>> communicate the project's expectations regarding PR quality and >>>> ownership to contributors. If a contributor simply dumps a low-effort >>>> PR that lacks the author's deep understanding and debugging >>>> capability, the document would set the expectation that it is unlikely >>>> to be reviewed by maintainers, thus preventing unnecessary maintenance >>>> burden. >>>> >>>> Best, >>>> Gang >>>> >>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote: >>>> > >>>> > Hi folks, >>>> > >>>> > I'd like to start a discussion on whether we should add a page to the >>>> > Iceberg documentation describing expectations around AI-generated >>>> > contributions. >>>> > >>>> > This topic has recently been discussed on the Arrow dev mailing >>>> > list[1]. In addition, the iceberg-cpp project has already taken a step >>>> > in this direction by introducing AI-related contribution >>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with >>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this >>>> > topic more broadly within the Iceberg community. >>>> > >>>> > The ASF already provides high-level guidance on the use of generative >>>> > AI tools, primarily focused on licensing and IP considerations[3]. As >>>> > AI-assisted development and so-called "vibe coding" become more >>>> > common, thoughtful use of these tools can be beneficial; however, if >>>> > the contributing author appears not to have engaged deeply with the >>>> > code and/or cannot respond to review feedback, this can significantly >>>> > increase maintainer burden and make the review process less >>>> > collaborative. >>>> > >>>> > Having documented guidelines would give maintainers a clear reference >>>> > point when evaluating such contributions (including when deciding to >>>> > close a PR), and would also make it easier to assess whether a >>>> > contributor has made a reasonable effort to meet project expectations. >>>> > >>>> > I've pulled together some guidelines from iceberg-cpp's PR and >>>> > discussions on the Arrow dev ML, hoping to kick off a broader >>>> > conversation about what should go into Iceberg's AI-generated >>>> > contribution guidelines. >>>> > >>>> > ----- >>>> > >>>> > We are not opposed to the use of AI tools in generating PRs, but we >>>> > recommend that contributors adhere to the following principles: >>>> > >>>> > - The PR author should **understand the core ideas** behind the >>>> > implementation **end-to-end**, and be able to justify the design and >>>> > code during review. >>>> > - **Calls out unknowns and assumptions**. It's okay to not fully >>>> > understand some bits of AI generated code. You should comment on these >>>> > cases and point them out to reviewers so that they can use their >>>> > knowledge of the codebase to clear up any concerns. For example, you >>>> > might comment "calling this function here seems to work but I'm not >>>> > familiar with how it works internally, I wonder if there's a race >>>> > condition if it is called concurrently". >>>> > - Only submit a PR if you are able to debug, explain, and take >>>> > ownership of the changes. >>>> > - Ensure the PR title and description match the style, level of >>>> > detail, and tone of other Iceberg PRs. >>>> > - Follow coding conventions used in the rest of the codebase. >>>> > - Be upfront about AI usage, including a brief summary of which parts >>>> > were AI-generated. >>>> > - Reference any sources that guided your changes (e.g. "took a similar >>>> > approach to #XXXX"). >>>> > >>>> > ----- >>>> > >>>> > Looking forward to hearing your thoughts. >>>> > >>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr >>>> > [2] https://github.com/apache/iceberg-cpp/pull/531 >>>> > [3] https://www.apache.org/legal/generative-tooling.html >>>> > >>>> > -- >>>> > Regards >>>> > Junwang Zhao >>>> >>>
