Thanks for the link Anurag, that looks like a good baseline for us as well.
On Thu, Jan 29, 2026 at 4:47 PM Anurag Mantripragada < [email protected]> wrote: > +1 to adding guidelines for AI-generated contributions. > > We have been discussing this topic with other project maintainers, such as > those from DataFusion. Protecting reviewer time is a primary challenge, as > it is a scarce resource. It is also helpful to see how other projects are > addressing this. Thank you for sharing the existing links. I would like to > add a relevant discussion from the KvRocks project [1] to the list, as I > found it very useful. > > [1] - https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6 ( > https://lists.apache.org/thread/h306y2s24626lsh0gxbh3w6rq8y4vmw6) > > ~ Anurag > > On Wed, Jan 28, 2026 at 9:32 PM Kevin Liu <[email protected]> wrote: > >> >> Thanks, Junwang, for starting this discussion. >> >> I think it's great to have project-level recommendations and guidelines >> in place regarding AI-generated contributions. Personally, I welcome >> AI-generated code and reviews. When used correctly, I believe it can be a >> force multiplier. >> However, I want to echo Gang's comment, as I've had similar experiences >> as a reviewer. Since code and PRs are now easier to generate, the burden >> often falls on the reviewer to read and comprehend them. I would like to be >> able to link to the guidelines to more easily set expectations. >> >> I think a great place to host this would be >> https://iceberg.apache.org/contribute/, and we can reference it in the >> subprojects. >> >> Best, >> Kevin Liu >> >> On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev < >> [email protected]> wrote: >> >>> I’d like to +1 on the importance of having PR titles + descriptions >>> match the existing human-written PRs. >>> >>> It can be challenging to detect AI-written code versus human-written >>> code. It is often trivial to detect AI-written issues and PR descriptions. >>> AI-generated text descriptions are often verbose, incorrect, and just time >>> consuming to read. >>> >>> I think that ensuring that the issues and PR text remain easily >>> consumable will help new contributors enter the project and help reviewers >>> review PRs. >>> >>> On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> >>> wrote: >>> >>>> Thanks for starting this discussion. I'd suggest embracing the changes. >>>> If there are guidelines that AI tools need to follow, we'd better >>>> formalize them in READMEs like skills[1] for code generation and review. >>>> I feel we are entering into the age where a Claude code agent >>>> submitting a PR and a GitHub copilot reviewing them. >>>> BTW, I have been using copilot to review PRs and the result is pretty >>>> good for the first pass of PR. >>>> >>>> >>>> 1. https://code.claude.com/docs/en/skills >>>> >>>> Regards, >>>> Manu >>>> >>>> >>>> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote: >>>> >>>>> Thanks Junwang for raising this! I strongly agree with this proposal. >>>>> >>>>> This aligns perfectly with some common issues I've recently >>>>> encountered in different projects. We have indeed observed a trend >>>>> where individuals, who lack a deep understanding of Iceberg, are >>>>> starting to use AI to generate PRs. This AI-produced code often looks >>>>> correct on the surface but contains numerous hidden issues. >>>>> >>>>> For iceberg-cpp, which has limited reviewer resources, processing >>>>> these low-quality PRs consumes a significant amount of valuable time >>>>> and effort. >>>>> >>>>> Therefore, a clear guidance document is crucial. It would effectively >>>>> communicate the project's expectations regarding PR quality and >>>>> ownership to contributors. If a contributor simply dumps a low-effort >>>>> PR that lacks the author's deep understanding and debugging >>>>> capability, the document would set the expectation that it is unlikely >>>>> to be reviewed by maintainers, thus preventing unnecessary maintenance >>>>> burden. >>>>> >>>>> Best, >>>>> Gang >>>>> >>>>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> >>>>> wrote: >>>>> > >>>>> > Hi folks, >>>>> > >>>>> > I'd like to start a discussion on whether we should add a page to the >>>>> > Iceberg documentation describing expectations around AI-generated >>>>> > contributions. >>>>> > >>>>> > This topic has recently been discussed on the Arrow dev mailing >>>>> > list[1]. In addition, the iceberg-cpp project has already taken a >>>>> step >>>>> > in this direction by introducing AI-related contribution >>>>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with >>>>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this >>>>> > topic more broadly within the Iceberg community. >>>>> > >>>>> > The ASF already provides high-level guidance on the use of generative >>>>> > AI tools, primarily focused on licensing and IP considerations[3]. As >>>>> > AI-assisted development and so-called "vibe coding" become more >>>>> > common, thoughtful use of these tools can be beneficial; however, if >>>>> > the contributing author appears not to have engaged deeply with the >>>>> > code and/or cannot respond to review feedback, this can significantly >>>>> > increase maintainer burden and make the review process less >>>>> > collaborative. >>>>> > >>>>> > Having documented guidelines would give maintainers a clear reference >>>>> > point when evaluating such contributions (including when deciding to >>>>> > close a PR), and would also make it easier to assess whether a >>>>> > contributor has made a reasonable effort to meet project >>>>> expectations. >>>>> > >>>>> > I've pulled together some guidelines from iceberg-cpp's PR and >>>>> > discussions on the Arrow dev ML, hoping to kick off a broader >>>>> > conversation about what should go into Iceberg's AI-generated >>>>> > contribution guidelines. >>>>> > >>>>> > ----- >>>>> > >>>>> > We are not opposed to the use of AI tools in generating PRs, but we >>>>> > recommend that contributors adhere to the following principles: >>>>> > >>>>> > - The PR author should **understand the core ideas** behind the >>>>> > implementation **end-to-end**, and be able to justify the design and >>>>> > code during review. >>>>> > - **Calls out unknowns and assumptions**. It's okay to not fully >>>>> > understand some bits of AI generated code. You should comment on >>>>> these >>>>> > cases and point them out to reviewers so that they can use their >>>>> > knowledge of the codebase to clear up any concerns. For example, you >>>>> > might comment "calling this function here seems to work but I'm not >>>>> > familiar with how it works internally, I wonder if there's a race >>>>> > condition if it is called concurrently". >>>>> > - Only submit a PR if you are able to debug, explain, and take >>>>> > ownership of the changes. >>>>> > - Ensure the PR title and description match the style, level of >>>>> > detail, and tone of other Iceberg PRs. >>>>> > - Follow coding conventions used in the rest of the codebase. >>>>> > - Be upfront about AI usage, including a brief summary of which parts >>>>> > were AI-generated. >>>>> > - Reference any sources that guided your changes (e.g. "took a >>>>> similar >>>>> > approach to #XXXX"). >>>>> > >>>>> > ----- >>>>> > >>>>> > Looking forward to hearing your thoughts. >>>>> > >>>>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr >>>>> > [2] https://github.com/apache/iceberg-cpp/pull/531 >>>>> > [3] https://www.apache.org/legal/generative-tooling.html >>>>> > >>>>> > -- >>>>> > Regards >>>>> > Junwang Zhao >>>>> >>>>
