Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Manu Zhang Wed, 28 Jan 2026 20:24:39 -0800

Thanks for starting this discussion. I'd suggest embracing the changes.
If there are guidelines that AI tools need to follow, we'd better formalize
them in READMEs like skills[1] for code generation and review.
I feel we are entering into the age where a Claude code agent submitting a
PR and a GitHub copilot reviewing them.
BTW, I have been using copilot to review PRs and the result is pretty good
for the first pass of PR.



1. https://code.claude.com/docs/en/skills

Regards,
Manu


On Mon, Jan 26, 2026 at 8:10 PM Gang Wu <[email protected]> wrote:

> Thanks Junwang for raising this! I strongly agree with this proposal.
>
> This aligns perfectly with some common issues I've recently
> encountered in different projects. We have indeed observed a trend
> where individuals, who lack a deep understanding of Iceberg, are
> starting to use AI to generate PRs. This AI-produced code often looks
> correct on the surface but contains numerous hidden issues.
>
> For iceberg-cpp, which has limited reviewer resources, processing
> these low-quality PRs consumes a significant amount of valuable time
> and effort.
>
> Therefore, a clear guidance document is crucial. It would effectively
> communicate the project's expectations regarding PR quality and
> ownership to contributors. If a contributor simply dumps a low-effort
> PR that lacks the author's deep understanding and debugging
> capability, the document would set the expectation that it is unlikely
> to be reviewed by maintainers, thus preventing unnecessary maintenance
> burden.
>
> Best,
> Gang
>
> On Mon, Jan 26, 2026 at 6:43 PM Junwang Zhao <[email protected]> wrote:
> >
> > Hi folks,
> >
> > I'd like to start a discussion on whether we should add a page to the
> > Iceberg documentation describing expectations around AI-generated
> > contributions.
> >
> > This topic has recently been discussed on the Arrow dev mailing
> > list[1]. In addition, the iceberg-cpp project has already taken a step
> > in this direction by introducing AI-related contribution
> > guidelines[2]. After a brief discussion on the iceberg-cpp's PR with
> > Fokko, Gang, and Kevin, we felt it would be worthwhile to raise this
> > topic more broadly within the Iceberg community.
> >
> > The ASF already provides high-level guidance on the use of generative
> > AI tools, primarily focused on licensing and IP considerations[3]. As
> > AI-assisted development and so-called "vibe coding" become more
> > common, thoughtful use of these tools can be beneficial; however, if
> > the contributing author appears not to have engaged deeply with the
> > code and/or cannot respond to review feedback, this can significantly
> > increase maintainer burden and make the review process less
> > collaborative.
> >
> > Having documented guidelines would give maintainers a clear reference
> > point when evaluating such contributions (including when deciding to
> > close a PR), and would also make it easier to assess whether a
> > contributor has made a reasonable effort to meet project expectations.
> >
> > I've pulled together some guidelines from iceberg-cpp's PR and
> > discussions on the Arrow dev ML, hoping to kick off a broader
> > conversation about what should go into Iceberg's AI-generated
> > contribution guidelines.
> >
> > -----
> >
> > We are not opposed to the use of AI tools in generating PRs, but we
> > recommend that contributors adhere to the following principles:
> >
> > - The PR author should **understand the core ideas** behind the
> > implementation **end-to-end**, and be able to justify the design and
> > code during review.
> > - **Calls out unknowns and assumptions**. It's okay to not fully
> > understand some bits of AI generated code. You should comment on these
> > cases and point them out to reviewers so that they can use their
> > knowledge of the codebase to clear up any concerns. For example, you
> > might comment "calling this function here seems to work but I'm not
> > familiar with how it works internally, I wonder if there's a race
> > condition if it is called concurrently".
> > - Only submit a PR if you are able to debug, explain, and take
> > ownership of the changes.
> > - Ensure the PR title and description match the style, level of
> > detail, and tone of other Iceberg PRs.
> > - Follow coding conventions used in the rest of the codebase.
> > - Be upfront about AI usage, including a brief summary of which parts
> > were AI-generated.
> > - Reference any sources that guided your changes (e.g. "took a similar
> > approach to #XXXX").
> >
> > -----
> >
> > Looking forward to hearing your thoughts.
> >
> > [1] https://lists.apache.org/thread/fyn1r3hjd3cs48n2svxg7lj0zps52bvr
> > [2] https://github.com/apache/iceberg-cpp/pull/531
> > [3] https://www.apache.org/legal/generative-tooling.html
> >
> > --
> > Regards
> > Junwang Zhao
>

Re: [DISCUSSION] documenting guidelines for AI-generated contributions

Reply via email to