Thank you for bringing this up. I also feel like I've interacted with a few of these PRs recently. My suspicion is that these PRs are created by an "openclaw"-like agent that is automatically finding issues, creating prs, and responding to reviews. This is slightly different from our previous conversation, which was centered around AI-generated PRs with human-in-the-loop. I've just ping the author in one of the suspected PR and linked to the guidelines.
I'm in favor of adding some more to the "Guidelines for AI-assisted Contributions" section [1]. I want to especially call out the burden on the reviewers and the limited reviewer resources. A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent will respect it? Best, Kevin Liu [1] https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev <[email protected]> wrote: > One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file. > > If somebody isn’t looking over their PR, they probably aren’t going to > look over the guidelines around contributing. Especially if they’re located > over in a docs page. > > A Pull Request Template forces them to see the community’s guidelines > before they formally make the PR. > > On Mon, Mar 9, 2026 at 7:55 PM Sung Yun <[email protected]> wrote: > >> Thanks for raising this Huaxin. I do think this is very much worth >> discussing. >> >> I also want to acknowledge that we recently updated the contribution >> guide here [1], so there is already some baseline guidance in place around >> AI-assisted contributions. >> >> My instinct is that we should be careful not to make this too much about >> AI itself, even though I agree that AI is what has made this issue much >> more pronounced. It is now much easier to generate PRs that look ready for >> review on the surface, even when the author has not really gone through the >> content carefully themselves. >> >> Because of that, I think it may be more useful to frame any additional >> guidance around the quality and readiness of the contribution, rather than >> around AI use by itself. That feels like a more durable way to set the >> standard, since it focuses on things we can actually assess consistently in >> review, rather than trying to determine how the content was produced. >> >> On that note, one practical place to start might be to have a more formal >> guideline around when a PR should be marked draft versus ready for review. >> I think a positive direction for the community would be to strengthen >> contributor judgment around what it means for a PR to actually be ready for >> reviewer attention, even if the change looks substantial on the surface. We >> already have a fairly simple mention of the draft PR process [2], and maybe >> that is a natural place to clarify our standard for what should be labeled >> ready for review. >> >> I also think that kind of guideline would be constructive for someone who >> is misreading the readiness of generated code. It gives them a clear way to >> adjust their behavior going forward, without making the first response a >> punishing one. If we start from an assumption of good intent, that seems >> like a better way to help contributors build stronger judgment over time. >> >> If the same pattern keeps repeating after that, then I think it makes >> sense to handle it as a contribution-process issue, regardless of whether >> generative tooling was involved. That may also be worth clarifying, and it >> aligns with your question about limiting contributions from people who >> repeatedly ignore these guidelines, although I hope clearer standards help >> avoid getting to that point. >> >> Cheers, >> Sung >> >> [1] https://github.com/apache/iceberg/pull/15213 >> [2] https://iceberg.apache.org/contribute/#pull-request-process >> >> On 2026/03/10 00:52:43 huaxin gao wrote: >> > Hi everyone, >> > >> > Some recent PRs look like they were made entirely by AI: finding issues, >> > writing code, opening PRs, and replying to review comments, with no >> human >> > review and no disclosure. >> > >> > Our guidelines already say contributors are expected to understand their >> > code, verify AI output before submitting, and disclose AI usage. The >> > problem is there's nothing about what happens when someone ignores them. >> > >> > Should we define consequences? For example: >> > >> > >> > - Closing PRs that were clearly not reviewed by a human before >> submitting >> > - Limiting contributions from people who repeatedly ignore these >> > guidelines >> > >> > It's OK to use AI to help write code, but submitting AI output without >> > looking at it and leaving it to maintainers to catch the problems is not >> > OK. >> > >> > What do you all think? >> > >> > Thanks, >> > >> > Huaxin >> > >> >
