Hi folks, On Fri, Jan 30, 2026 at 9:47 AM Russell Spitzer <[email protected]> wrote: > > Thanks for the link Anurag, that looks like a good baseline for us as well. > > On Thu, Jan 29, 2026 at 4:47 PM Anurag Mantripragada > <[email protected]> wrote: >> >> +1 to adding guidelines for AI-generated contributions. >> >> We have been discussing this topic with other project maintainers, such as >> those from DataFusion. Protecting reviewer time is a primary challenge, as >> it is a scarce resource. It is also helpful to see how other projects are >> addressing this. Thank you for sharing the existing links. I would like to >> add a relevant discussion from the KvRocks project [1] to the list, as I >> found it very useful. >> >> [1] - https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6 >> (https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6) >> >> ~ Anurag >> >> On Wed, Jan 28, 2026 at 9:32 PM Kevin Liu <[email protected]> wrote: >>> >>> >>> Thanks, Junwang, for starting this discussion. >>> >>> I think it's great to have project-level recommendations and guidelines in >>> place regarding AI-generated contributions. Personally, I welcome >>> AI-generated code and reviews. When used correctly, I believe it can be a >>> force multiplier. >>> However, I want to echo Gang's comment, as I've had similar experiences as >>> a reviewer. Since code and PRs are now easier to generate, the burden often >>> falls on the reviewer to read and comprehend them. I would like to be able >>> to link to the guidelines to more easily set expectations. >>> >>> I think a great place to host this would be >>> https://iceberg.apache.org/contribute/, and we can reference it in the >>> subprojects. >>> >>> Best, >>> Kevin Liu >>> >>> On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev >>> <[email protected]> wrote: >>>> >>>> I’d like to +1 on the importance of having PR titles + descriptions match >>>> the existing human-written PRs. >>>> >>>> It can be challenging to detect AI-written code versus human-written code. >>>> It is often trivial to detect AI-written issues and PR descriptions. >>>> AI-generated text descriptions are often verbose, incorrect, and just time >>>> consuming to read. >>>> >>>> I think that ensuring that the issues and PR text remain easily consumable >>>> will help new contributors enter the project and help reviewers review PRs. >>>> >>>> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> wrote: >>>>> >>>>> Thanks for starting this discussion. I'd suggest embracing the changes. >>>>> If there are guidelines that AI tools need to follow, we'd better >>>>> formalize them in READMEs like skills[1] for code generation and review. >>>>> I feel we are entering into the age where a Claude code agent submitting >>>>> a PR and a GitHub copilot reviewing them. >>>>> BTW, I have been using copilot to review PRs and the result is pretty >>>>> good for the first pass of PR. >>>>> >>>>> >>>>> 1. https://code.claude.com/docs/en/skills >>>>> >>>>> Regards, >>>>> Manu >>>>> >>>>> >>>>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote: >>>>>> >>>>>> Thanks Junwang for raising this! I strongly agree with this proposal. >>>>>> >>>>>> This aligns perfectly with some common issues I've recently >>>>>> encountered in different projects. We have indeed observed a trend >>>>>> where individuals, who lack a deep understanding of Iceberg, are >>>>>> starting to use AI to generate PRs. This AI-produced code often looks >>>>>> correct on the surface but contains numerous hidden issues. >>>>>> >>>>>> For iceberg-cpp, which has limited reviewer resources, processing >>>>>> these low-quality PRs consumes a significant amount of valuable time >>>>>> and effort. >>>>>> >>>>>> Therefore, a clear guidance document is crucial. It would effectively >>>>>> communicate the project's expectations regarding PR quality and >>>>>> ownership to contributors. If a contributor simply dumps a low-effort >>>>>> PR that lacks the author's deep understanding and debugging >>>>>> capability, the document would set the expectation that it is unlikely >>>>>> to be reviewed by maintainers, thus preventing unnecessary maintenance >>>>>> burden. >>>>>> >>>>>> Best, >>>>>> Gang >>>>>> >>>>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote: >>>>>> > >>>>>> > Hi folks, >>>>>> > >>>>>> > I'd like to start a discussion on whether we should add a page to the >>>>>> > Iceberg documentation describing expectations around AI-generated >>>>>> > contributions. >>>>>> > >>>>>> > This topic has recently been discussed on the Arrow dev mailing >>>>>> > list[1]. In addition, the iceberg-cpp project has already taken a step >>>>>> > in this direction by introducing AI-related contribution >>>>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with >>>>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this >>>>>> > topic more broadly within the Iceberg community. >>>>>> > >>>>>> > The ASF already provides high-level guidance on the use of generative >>>>>> > AI tools, primarily focused on licensing and IP considerations[3]. As >>>>>> > AI-assisted development and so-called "vibe coding" become more >>>>>> > common, thoughtful use of these tools can be beneficial; however, if >>>>>> > the contributing author appears not to have engaged deeply with the >>>>>> > code and/or cannot respond to review feedback, this can significantly >>>>>> > increase maintainer burden and make the review process less >>>>>> > collaborative. >>>>>> > >>>>>> > Having documented guidelines would give maintainers a clear reference >>>>>> > point when evaluating such contributions (including when deciding to >>>>>> > close a PR), and would also make it easier to assess whether a >>>>>> > contributor has made a reasonable effort to meet project expectations. >>>>>> > >>>>>> > I've pulled together some guidelines from iceberg-cpp's PR and >>>>>> > discussions on the Arrow dev ML, hoping to kick off a broader >>>>>> > conversation about what should go into Iceberg's AI-generated >>>>>> > contribution guidelines. >>>>>> > >>>>>> > ----- >>>>>> > >>>>>> > We are not opposed to the use of AI tools in generating PRs, but we >>>>>> > recommend that contributors adhere to the following principles: >>>>>> > >>>>>> > - The PR author should **understand the core ideas** behind the >>>>>> > implementation **end-to-end**, and be able to justify the design and >>>>>> > code during review. >>>>>> > - **Calls out unknowns and assumptions**. It's okay to not fully >>>>>> > understand some bits of AI generated code. You should comment on these >>>>>> > cases and point them out to reviewers so that they can use their >>>>>> > knowledge of the codebase to clear up any concerns. For example, you >>>>>> > might comment "calling this function here seems to work but I'm not >>>>>> > familiar with how it works internally, I wonder if there's a race >>>>>> > condition if it is called concurrently". >>>>>> > - Only submit a PR if you are able to debug, explain, and take >>>>>> > ownership of the changes. >>>>>> > - Ensure the PR title and description match the style, level of >>>>>> > detail, and tone of other Iceberg PRs. >>>>>> > - Follow coding conventions used in the rest of the codebase. >>>>>> > - Be upfront about AI usage, including a brief summary of which parts >>>>>> > were AI-generated. >>>>>> > - Reference any sources that guided your changes (e.g. "took a similar >>>>>> > approach to #XXXX"). >>>>>> > >>>>>> > ----- >>>>>> > >>>>>> > Looking forward to hearing your thoughts. >>>>>> > >>>>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr >>>>>> > [2] https://github.com/apache/iceberg-cpp/pull/531 >>>>>> > [3] https://www.apache.org/legal/generative-tooling.html >>>>>> > >>>>>> > -- >>>>>> > Regards >>>>>> > Junwang Zhao
Thanks for all the feedback. I've opened a PR[1] to add a new `Guidelines for AI-assisted Contributions` section to https://iceberg.apache.org/contribute/ (as suggested by Kevin). The draft references prior discussions in the iceberg-cpp [2] and Arrow dev ml [3], as well as the Kvrocks documentation on a similar topic [4]. Please take a look and let me know whether this is adequate, any comments would be greatly appreciated. [1] https://github.com/apache/iceberg/pull/15213 [2] https://github.com/apache/iceberg-cpp/pull/531 [3] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr [4] https://kvrocks.apache.org/community/contributing/#guidelines-for-ai-assisted-contributions -- Regards Junwang Zhao
