I’d like to +1 on the importance of having PR titles + descriptions match the existing human-written PRs.
It can be challenging to detect AI-written code versus human-written code. It is often trivial to detect AI-written issues and PR descriptions. AI-generated text descriptions are often verbose, incorrect, and just time consuming to read. I think that ensuring that the issues and PR text remain easily consumable will help new contributors enter the project and help reviewers review PRs. On Wed, Jan 28, 2026 at 8:24 PM Manu Zhang <[email protected]> wrote: > Thanks for starting this discussion. I'd suggest embracing the changes. > If there are guidelines that AI tools need to follow, we'd better > formalize them in READMEs like skills[1] for code generation and review. > I feel we are entering into the age where a Claude code agent submitting a > PR and a GitHub copilot reviewing them. > BTW, I have been using copilot to review PRs and the result is pretty good > for the first pass of PR. > > > 1. https://code.claude.com/docs/en/skills > > Regards, > Manu > > > On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote: > >> Thanks Junwang for raising this! I strongly agree with this proposal. >> >> This aligns perfectly with some common issues I've recently >> encountered in different projects. We have indeed observed a trend >> where individuals, who lack a deep understanding of Iceberg, are >> starting to use AI to generate PRs. This AI-produced code often looks >> correct on the surface but contains numerous hidden issues. >> >> For iceberg-cpp, which has limited reviewer resources, processing >> these low-quality PRs consumes a significant amount of valuable time >> and effort. >> >> Therefore, a clear guidance document is crucial. It would effectively >> communicate the project's expectations regarding PR quality and >> ownership to contributors. If a contributor simply dumps a low-effort >> PR that lacks the author's deep understanding and debugging >> capability, the document would set the expectation that it is unlikely >> to be reviewed by maintainers, thus preventing unnecessary maintenance >> burden. >> >> Best, >> Gang >> >> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote: >> > >> > Hi folks, >> > >> > I'd like to start a discussion on whether we should add a page to the >> > Iceberg documentation describing expectations around AI-generated >> > contributions. >> > >> > This topic has recently been discussed on the Arrow dev mailing >> > list[1]. In addition, the iceberg-cpp project has already taken a step >> > in this direction by introducing AI-related contribution >> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with >> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this >> > topic more broadly within the Iceberg community. >> > >> > The ASF already provides high-level guidance on the use of generative >> > AI tools, primarily focused on licensing and IP considerations[3]. As >> > AI-assisted development and so-called "vibe coding" become more >> > common, thoughtful use of these tools can be beneficial; however, if >> > the contributing author appears not to have engaged deeply with the >> > code and/or cannot respond to review feedback, this can significantly >> > increase maintainer burden and make the review process less >> > collaborative. >> > >> > Having documented guidelines would give maintainers a clear reference >> > point when evaluating such contributions (including when deciding to >> > close a PR), and would also make it easier to assess whether a >> > contributor has made a reasonable effort to meet project expectations. >> > >> > I've pulled together some guidelines from iceberg-cpp's PR and >> > discussions on the Arrow dev ML, hoping to kick off a broader >> > conversation about what should go into Iceberg's AI-generated >> > contribution guidelines. >> > >> > ----- >> > >> > We are not opposed to the use of AI tools in generating PRs, but we >> > recommend that contributors adhere to the following principles: >> > >> > - The PR author should **understand the core ideas** behind the >> > implementation **end-to-end**, and be able to justify the design and >> > code during review. >> > - **Calls out unknowns and assumptions**. It's okay to not fully >> > understand some bits of AI generated code. You should comment on these >> > cases and point them out to reviewers so that they can use their >> > knowledge of the codebase to clear up any concerns. For example, you >> > might comment "calling this function here seems to work but I'm not >> > familiar with how it works internally, I wonder if there's a race >> > condition if it is called concurrently". >> > - Only submit a PR if you are able to debug, explain, and take >> > ownership of the changes. >> > - Ensure the PR title and description match the style, level of >> > detail, and tone of other Iceberg PRs. >> > - Follow coding conventions used in the rest of the codebase. >> > - Be upfront about AI usage, including a brief summary of which parts >> > were AI-generated. >> > - Reference any sources that guided your changes (e.g. "took a similar >> > approach to #XXXX"). >> > >> > ----- >> > >> > Looking forward to hearing your thoughts. >> > >> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr >> > [2] https://github.com/apache/iceberg-cpp/pull/531 >> > [3] https://www.apache.org/legal/generative-tooling.html >> > >> > -- >> > Regards >> > Junwang Zhao >> >
