Hey all, Just to confirm I understand properly: Is our goal is to reduce the number of low-quality PRs submitted? If so, our team has been thinking a lot about how we can help projects reduce the burden of low-quality PRs on maintainers. I have been testing different structures for various agent instruction docs (AGENTS.md, llms.txt, a net new file, etc) to see if we can identify the most effective way to summarize existing AI policies in a way that agents can best process and understand them. I'm happy to test some of our ideas here as well. If it works, I'll compile it into a PR and submit it so we can evaluate how well it works to reduce the burden on maintainers. If it doesn't work, I'll report back here so we know what didn't work and can go a different direction.
Is there a canonical set of 3–5 example PRs that represent the type of pull requests we see often and want to discourage? If not, I can go through the recently closed PRs and identify what I think are low-quality contributions for the testing. Cheers, Shane On Mon, Mar 16, 2026 at 6:02 PM huaxin gao <[email protected]> wrote: > Hi everyone, > > Thank you all for the discussion. There is broad agreement that we need > clearer rules around contribution quality and what happens when those rules > are not followed. > > A few key points that came up: > > > - Focus on contribution quality and readiness, not on trying to detect > AI usage itself > - Close PRs that the author clearly did not review before submitting > - Add a PR template so contributors see the guidelines when they open > a PR > - Add an AGENTS.md to set rules for AI tools to follow > - Add clear consequences to the guidelines for contributors who > repeatedly ignore them > - We should find ways to prevent fully automated agent PRs > > Kevin has added "Agent automated PRs" to the next Iceberg sync agenda so > we can continue the discussion there. > > Thanks, > > Huaxin > > On Tue, Mar 10, 2026 at 5:33 AM Steve Loughran <[email protected]> > wrote: > >> >> First, anyone who is an active committer on a project with >5 k github >> stars gets 6 month of claude max free >> https://claude.com/contact-sales/claude-for-oss >> >> Which means: many more asf committers will be experiencing what it can >> and can't do. >> >> I'm still learning what it can do, especially on any large body of code, >> and am happy with the blocking of pure/overly AI generated content as it >> will only create issues downstream. That's production code, tests, etc. >> Documentation is an interesting one though, as the tools are good for tasks >> like "review all links and flag broken ones" as well as "read the docs and >> highlight inconsistencies". >> >> One thing which may be good for any OSS project is to have official >> CLAUDE.md, GEMINI.md and the copilot equivalents to provide strict >> instructions to the AI tooling which it doesn't auto infer from the simple >> /init commands (attached: those two for iceberg) >> >> I'm thing of extra style and process, but also instructions to the AI to >> stop it getting over-enthusiastic >> >> 1. always use slf4j logging (had a bad experience with gemini >> replacing every log statement with system.out in my two file project as it >> couldn't see the output to debug test setup) >> 2. thread safety requirements >> 3. tests to go with the code to explore all branches and failure >> conditions >> 4. use no content outside this directory tree >> 5. add a /* begin: AI */ and /* end: AI */ around changes of a given >> size (ASF policy after all) >> 6. do not touch anything under /format >> >> + add the various .gemini/.copilot/.claude dirs with .gitignore set up to >> ignore customisations. >> >> >> >> >> >> On Tue, 10 Mar 2026 at 03:32, vaquar khan <[email protected]> wrote: >> >>> Hi Huaxin, Junwang, >>> >>> I’ve been following this thread and I feel the same pain. Reviewing "AI >>> slop" is the fastest way to burn out a committer, and Junwang is right, >>> manual closing is just extra work we don't need . >>> >>> I've been working on a small utility called AIV (Automated Integrity >>> Validation) to help with this exact problem at my day job. Instead of >>> trying to "detect" AI which is a losing battle,it focuses on Logic Density. >>> Essentially, it checks the ratio of real functional changes to boilerplate. >>> If someone submits 300 lines of scaffolding but only 2 lines of actual >>> logic, AIV flags it as "Low Substance." This directly addresses Sung’s >>> point about "readiness" ,it forces the author to prove there’s actual work >>> in the PR before a human ever looks at it . >>> >>> I’ve already put together a few Iceberg-specific Design Rules for >>> testing. For example, it can catch when a PR tries to bypass the >>> ExpireSnapshots API or ignores the new V4 metadata constraints,patterns >>> that AI agents miss 100% of the time . >>> >>> It runs 100% locally or in a CI step with no API keys needed . If the >>> community is interested, I’m happy to share the code ,it's already an >>> apache licence and we could look at a non-blocking trial to help triage the >>> incoming queue . >>> >>> Regards, >>> Viquar Khan >>> >>> On Mon, 9 Mar 2026 at 22:13, Kevin Liu <[email protected]> wrote: >>> >>>> Thank you for bringing this up. I also feel like I've interacted with a >>>> few of these PRs recently. My suspicion is that these PRs are created by an >>>> "openclaw"-like agent that is automatically finding issues, creating prs, >>>> and responding to reviews. This is slightly different from our previous >>>> conversation, which was centered around AI-generated PRs with >>>> human-in-the-loop. I've just ping the author in one of the suspected PR and >>>> linked to the guidelines. >>>> >>>> I'm in favor of adding some more to the "Guidelines for AI-assisted >>>> Contributions" section [1]. I want to especially call out the burden on the >>>> reviewers and the limited reviewer resources. >>>> >>>> A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the >>>> agent will respect it? >>>> >>>> Best, >>>> Kevin Liu >>>> >>>> >>>> [1] >>>> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions >>>> >>>> On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev < >>>> [email protected]> wrote: >>>> >>>>> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md >>>>> file. >>>>> >>>>> If somebody isn’t looking over their PR, they probably aren’t going to >>>>> look over the guidelines around contributing. Especially if they’re >>>>> located >>>>> over in a docs page. >>>>> >>>>> A Pull Request Template forces them to see the community’s guidelines >>>>> before they formally make the PR. >>>>> >>>>> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun <[email protected]> wrote: >>>>> >>>>>> Thanks for raising this Huaxin. I do think this is very much worth >>>>>> discussing. >>>>>> >>>>>> I also want to acknowledge that we recently updated the contribution >>>>>> guide here [1], so there is already some baseline guidance in place >>>>>> around >>>>>> AI-assisted contributions. >>>>>> >>>>>> My instinct is that we should be careful not to make this too much >>>>>> about AI itself, even though I agree that AI is what has made this issue >>>>>> much more pronounced. It is now much easier to generate PRs that look >>>>>> ready >>>>>> for review on the surface, even when the author has not really gone >>>>>> through >>>>>> the content carefully themselves. >>>>>> >>>>>> Because of that, I think it may be more useful to frame any >>>>>> additional guidance around the quality and readiness of the contribution, >>>>>> rather than around AI use by itself. That feels like a more durable way >>>>>> to >>>>>> set the standard, since it focuses on things we can actually assess >>>>>> consistently in review, rather than trying to determine how the content >>>>>> was >>>>>> produced. >>>>>> >>>>>> On that note, one practical place to start might be to have a more >>>>>> formal guideline around when a PR should be marked draft versus ready for >>>>>> review. I think a positive direction for the community would be to >>>>>> strengthen contributor judgment around what it means for a PR to actually >>>>>> be ready for reviewer attention, even if the change looks substantial on >>>>>> the surface. We already have a fairly simple mention of the draft PR >>>>>> process [2], and maybe that is a natural place to clarify our standard >>>>>> for >>>>>> what should be labeled ready for review. >>>>>> >>>>>> I also think that kind of guideline would be constructive for someone >>>>>> who is misreading the readiness of generated code. It gives them a clear >>>>>> way to adjust their behavior going forward, without making the first >>>>>> response a punishing one. If we start from an assumption of good intent, >>>>>> that seems like a better way to help contributors build stronger judgment >>>>>> over time. >>>>>> >>>>>> If the same pattern keeps repeating after that, then I think it makes >>>>>> sense to handle it as a contribution-process issue, regardless of whether >>>>>> generative tooling was involved. That may also be worth clarifying, and >>>>>> it >>>>>> aligns with your question about limiting contributions from people who >>>>>> repeatedly ignore these guidelines, although I hope clearer standards >>>>>> help >>>>>> avoid getting to that point. >>>>>> >>>>>> Cheers, >>>>>> Sung >>>>>> >>>>>> [1] https://github.com/apache/iceberg/pull/15213 >>>>>> [2] https://iceberg.apache.org/contribute/#pull-request-process >>>>>> >>>>>> On 2026/03/10 00:52:43 huaxin gao wrote: >>>>>> > Hi everyone, >>>>>> > >>>>>> > Some recent PRs look like they were made entirely by AI: finding >>>>>> issues, >>>>>> > writing code, opening PRs, and replying to review comments, with no >>>>>> human >>>>>> > review and no disclosure. >>>>>> > >>>>>> > Our guidelines already say contributors are expected to understand >>>>>> their >>>>>> > code, verify AI output before submitting, and disclose AI usage. The >>>>>> > problem is there's nothing about what happens when someone ignores >>>>>> them. >>>>>> > >>>>>> > Should we define consequences? For example: >>>>>> > >>>>>> > >>>>>> > - Closing PRs that were clearly not reviewed by a human before >>>>>> submitting >>>>>> > - Limiting contributions from people who repeatedly ignore these >>>>>> > guidelines >>>>>> > >>>>>> > It's OK to use AI to help write code, but submitting AI output >>>>>> without >>>>>> > looking at it and leaving it to maintainers to catch the problems >>>>>> is not >>>>>> > OK. >>>>>> > >>>>>> > What do you all think? >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Huaxin >>>>>> > >>>>>> >>>>> -- Shane C. Glass | Lead, Open Source Strategy and Success | [email protected] | (206) 785-7697 "You'll be told in a hundred ways, some subtle and some not, to keep climbing, and never be satisfied with where you are, who you are, and what you're doing. There are a million ways to sell yourself out, and I guarantee you'll hear about them. To invent your own life's meaning is not easy, but it's still allowed, and I think you'll be happier for the trouble." -Bill Watterson
