Hi Huaxin, Junwang, I’ve been following this thread and I feel the same pain. Reviewing "AI slop" is the fastest way to burn out a committer, and Junwang is right, manual closing is just extra work we don't need .
I've been working on a small utility called AIV (Automated Integrity Validation) to help with this exact problem at my day job. Instead of trying to "detect" AI which is a losing battle,it focuses on Logic Density. Essentially, it checks the ratio of real functional changes to boilerplate. If someone submits 300 lines of scaffolding but only 2 lines of actual logic, AIV flags it as "Low Substance." This directly addresses Sung’s point about "readiness" ,it forces the author to prove there’s actual work in the PR before a human ever looks at it . I’ve already put together a few Iceberg-specific Design Rules for testing. For example, it can catch when a PR tries to bypass the ExpireSnapshots API or ignores the new V4 metadata constraints,patterns that AI agents miss 100% of the time . It runs 100% locally or in a CI step with no API keys needed . If the community is interested, I’m happy to share the code ,it's already an apache licence and we could look at a non-blocking trial to help triage the incoming queue . Regards, Viquar Khan On Mon, 9 Mar 2026 at 22:13, Kevin Liu <[email protected]> wrote: > Thank you for bringing this up. I also feel like I've interacted with a > few of these PRs recently. My suspicion is that these PRs are created by an > "openclaw"-like agent that is automatically finding issues, creating prs, > and responding to reviews. This is slightly different from our previous > conversation, which was centered around AI-generated PRs with > human-in-the-loop. I've just ping the author in one of the suspected PR and > linked to the guidelines. > > I'm in favor of adding some more to the "Guidelines for AI-assisted > Contributions" section [1]. I want to especially call out the burden on the > reviewers and the limited reviewer resources. > > A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent > will respect it? > > Best, > Kevin Liu > > > [1] > https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions > > On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev < > [email protected]> wrote: > >> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file. >> >> If somebody isn’t looking over their PR, they probably aren’t going to >> look over the guidelines around contributing. Especially if they’re located >> over in a docs page. >> >> A Pull Request Template forces them to see the community’s guidelines >> before they formally make the PR. >> >> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun <[email protected]> wrote: >> >>> Thanks for raising this Huaxin. I do think this is very much worth >>> discussing. >>> >>> I also want to acknowledge that we recently updated the contribution >>> guide here [1], so there is already some baseline guidance in place around >>> AI-assisted contributions. >>> >>> My instinct is that we should be careful not to make this too much about >>> AI itself, even though I agree that AI is what has made this issue much >>> more pronounced. It is now much easier to generate PRs that look ready for >>> review on the surface, even when the author has not really gone through the >>> content carefully themselves. >>> >>> Because of that, I think it may be more useful to frame any additional >>> guidance around the quality and readiness of the contribution, rather than >>> around AI use by itself. That feels like a more durable way to set the >>> standard, since it focuses on things we can actually assess consistently in >>> review, rather than trying to determine how the content was produced. >>> >>> On that note, one practical place to start might be to have a more >>> formal guideline around when a PR should be marked draft versus ready for >>> review. I think a positive direction for the community would be to >>> strengthen contributor judgment around what it means for a PR to actually >>> be ready for reviewer attention, even if the change looks substantial on >>> the surface. We already have a fairly simple mention of the draft PR >>> process [2], and maybe that is a natural place to clarify our standard for >>> what should be labeled ready for review. >>> >>> I also think that kind of guideline would be constructive for someone >>> who is misreading the readiness of generated code. It gives them a clear >>> way to adjust their behavior going forward, without making the first >>> response a punishing one. If we start from an assumption of good intent, >>> that seems like a better way to help contributors build stronger judgment >>> over time. >>> >>> If the same pattern keeps repeating after that, then I think it makes >>> sense to handle it as a contribution-process issue, regardless of whether >>> generative tooling was involved. That may also be worth clarifying, and it >>> aligns with your question about limiting contributions from people who >>> repeatedly ignore these guidelines, although I hope clearer standards help >>> avoid getting to that point. >>> >>> Cheers, >>> Sung >>> >>> [1] https://github.com/apache/iceberg/pull/15213 >>> [2] https://iceberg.apache.org/contribute/#pull-request-process >>> >>> On 2026/03/10 00:52:43 huaxin gao wrote: >>> > Hi everyone, >>> > >>> > Some recent PRs look like they were made entirely by AI: finding >>> issues, >>> > writing code, opening PRs, and replying to review comments, with no >>> human >>> > review and no disclosure. >>> > >>> > Our guidelines already say contributors are expected to understand >>> their >>> > code, verify AI output before submitting, and disclose AI usage. The >>> > problem is there's nothing about what happens when someone ignores >>> them. >>> > >>> > Should we define consequences? For example: >>> > >>> > >>> > - Closing PRs that were clearly not reviewed by a human before >>> submitting >>> > - Limiting contributions from people who repeatedly ignore these >>> > guidelines >>> > >>> > It's OK to use AI to help write code, but submitting AI output without >>> > looking at it and leaving it to maintainers to catch the problems is >>> not >>> > OK. >>> > >>> > What do you all think? >>> > >>> > Thanks, >>> > >>> > Huaxin >>> > >>> >>
