Thanks, Junwang, for starting this discussion. I think it's great to have project-level recommendations and guidelines in place regarding AI-generated contributions. Personally, I welcome AI-generated code and reviews. When used correctly, I believe it can be a force multiplier. However, I want to echo Gang's comment, as I've had similar experiences as a reviewer. Since code and PRs are now easier to generate, the burden often falls on the reviewer to read and comprehend them. I would like to be able to link to the guidelines to more easily set expectations.
I think a great place to host this would be https://iceberg.apache.org/contribute/, and we can reference it in the subprojects. Best, Kevin Liu On Wed, Jan 28, 2026 at 8:53 PM Alex Stephen via dev <[email protected]> wrote: > I’d like to +1 on the importance of having PR titles + descriptions match > the existing human-written PRs. > > It can be challenging to detect AI-written code versus human-written code. > It is often trivial to detect AI-written issues and PR descriptions. > AI-generated text descriptions are often verbose, incorrect, and just time > consuming to read. > > I think that ensuring that the issues and PR text remain easily consumable > will help new contributors enter the project and help reviewers review PRs. > > On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> > wrote: > >> Thanks for starting this discussion. I'd suggest embracing the changes. >> If there are guidelines that AI tools need to follow, we'd better >> formalize them in READMEs like skills[1] for code generation and review. >> I feel we are entering into the age where a Claude code agent submitting >> a PR and a GitHub copilot reviewing them. >> BTW, I have been using copilot to review PRs and the result is pretty >> good for the first pass of PR. >> >> >> 1. https://code.claude.com/docs/en/skills >> >> Regards, >> Manu >> >> >> On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote: >> >>> Thanks Junwang for raising this! I strongly agree with this proposal. >>> >>> This aligns perfectly with some common issues I've recently >>> encountered in different projects. We have indeed observed a trend >>> where individuals, who lack a deep understanding of Iceberg, are >>> starting to use AI to generate PRs. This AI-produced code often looks >>> correct on the surface but contains numerous hidden issues. >>> >>> For iceberg-cpp, which has limited reviewer resources, processing >>> these low-quality PRs consumes a significant amount of valuable time >>> and effort. >>> >>> Therefore, a clear guidance document is crucial. It would effectively >>> communicate the project's expectations regarding PR quality and >>> ownership to contributors. If a contributor simply dumps a low-effort >>> PR that lacks the author's deep understanding and debugging >>> capability, the document would set the expectation that it is unlikely >>> to be reviewed by maintainers, thus preventing unnecessary maintenance >>> burden. >>> >>> Best, >>> Gang >>> >>> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote: >>> > >>> > Hi folks, >>> > >>> > I'd like to start a discussion on whether we should add a page to the >>> > Iceberg documentation describing expectations around AI-generated >>> > contributions. >>> > >>> > This topic has recently been discussed on the Arrow dev mailing >>> > list[1]. In addition, the iceberg-cpp project has already taken a step >>> > in this direction by introducing AI-related contribution >>> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with >>> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this >>> > topic more broadly within the Iceberg community. >>> > >>> > The ASF already provides high-level guidance on the use of generative >>> > AI tools, primarily focused on licensing and IP considerations[3]. As >>> > AI-assisted development and so-called "vibe coding" become more >>> > common, thoughtful use of these tools can be beneficial; however, if >>> > the contributing author appears not to have engaged deeply with the >>> > code and/or cannot respond to review feedback, this can significantly >>> > increase maintainer burden and make the review process less >>> > collaborative. >>> > >>> > Having documented guidelines would give maintainers a clear reference >>> > point when evaluating such contributions (including when deciding to >>> > close a PR), and would also make it easier to assess whether a >>> > contributor has made a reasonable effort to meet project expectations. >>> > >>> > I've pulled together some guidelines from iceberg-cpp's PR and >>> > discussions on the Arrow dev ML, hoping to kick off a broader >>> > conversation about what should go into Iceberg's AI-generated >>> > contribution guidelines. >>> > >>> > ----- >>> > >>> > We are not opposed to the use of AI tools in generating PRs, but we >>> > recommend that contributors adhere to the following principles: >>> > >>> > - The PR author should **understand the core ideas** behind the >>> > implementation **end-to-end**, and be able to justify the design and >>> > code during review. >>> > - **Calls out unknowns and assumptions**. It's okay to not fully >>> > understand some bits of AI generated code. You should comment on these >>> > cases and point them out to reviewers so that they can use their >>> > knowledge of the codebase to clear up any concerns. For example, you >>> > might comment "calling this function here seems to work but I'm not >>> > familiar with how it works internally, I wonder if there's a race >>> > condition if it is called concurrently". >>> > - Only submit a PR if you are able to debug, explain, and take >>> > ownership of the changes. >>> > - Ensure the PR title and description match the style, level of >>> > detail, and tone of other Iceberg PRs. >>> > - Follow coding conventions used in the rest of the codebase. >>> > - Be upfront about AI usage, including a brief summary of which parts >>> > were AI-generated. >>> > - Reference any sources that guided your changes (e.g. "took a similar >>> > approach to #XXXX"). >>> > >>> > ----- >>> > >>> > Looking forward to hearing your thoughts. >>> > >>> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr >>> > [2] https://github.com/apache/iceberg-cpp/pull/531 >>> > [3] https://www.apache.org/legal/generative-tooling.html >>> > >>> > -- >>> > Regards >>> > Junwang Zhao >>> >>
