Re: [DISCUSS] Enforcing AI contribution guidelines

vaquar khan Mon, 09 Mar 2026 20:32:36 -0700

Hi Huaxin, Junwang,

I’ve been following this thread and I feel the same pain. Reviewing "AI
slop" is the fastest way to burn out a committer, and Junwang is right,
manual closing is just extra work we don't need .


I've been working on a small utility called AIV (Automated Integrity
Validation) to help with this exact problem at my day job. Instead of
trying to "detect" AI which is a losing battle,it focuses on Logic Density.
Essentially, it checks the ratio of real functional changes to boilerplate.
If someone submits 300 lines of scaffolding but only 2 lines of actual
logic, AIV flags it as "Low Substance." This directly addresses Sung’s
point about "readiness" ,it forces the author to prove there’s actual work
in the PR before a human ever looks at it .

I’ve already put together a few Iceberg-specific Design Rules for testing.
For example, it can catch when a PR tries to bypass the ExpireSnapshots API
or ignores the new V4 metadata constraints,patterns that AI agents miss
100% of the time .

It runs 100% locally or in a CI step with no API keys needed . If the
community is interested, I’m happy to share the code ,it's already an
apache licence and we could look at a non-blocking trial to help triage the
incoming queue .

Regards,
Viquar Khan

On Mon, 9 Mar 2026 at 22:13, Kevin Liu <[email protected]> wrote:

> Thank you for bringing this up. I also feel like I've interacted with a
> few of these PRs recently. My suspicion is that these PRs are created by an
> "openclaw"-like agent that is automatically finding issues, creating prs,
> and responding to reviews. This is slightly different from our previous
> conversation, which was centered around AI-generated PRs with
> human-in-the-loop. I've just ping the author in one of the suspected PR and
> linked to the guidelines.
>
> I'm in favor of adding some more to the "Guidelines for AI-assisted
> Contributions" section [1]. I want to especially call out the burden on the
> reviewers and the limited reviewer resources.
>
> A wild idea: if we add an AGENTS.md to the Iceberg repo, maybe the agent
> will respect it?
>
> Best,
> Kevin Liu
>
>
> [1]
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
>
> On Mon, Mar 9, 2026 at 8:05 PM Alex Stephen via dev <
> [email protected]> wrote:
>
>> One thing worth considering is a .github/PULL_REQUEST_TEMPLATE.md file.
>>
>> If somebody isn’t looking over their PR, they probably aren’t going to
>> look over the guidelines around contributing. Especially if they’re located
>> over in a docs page.
>>
>> A Pull Request Template forces them to see the community’s guidelines
>> before they formally make the PR.
>>
>> On Mon, Mar 9, 2026 at 7:55 PM Sung Yun <[email protected]> wrote:
>>
>>> Thanks for raising this Huaxin. I do think this is very much worth
>>> discussing.
>>>
>>> I also want to acknowledge that we recently updated the contribution
>>> guide here [1], so there is already some baseline guidance in place around
>>> AI-assisted contributions.
>>>
>>> My instinct is that we should be careful not to make this too much about
>>> AI itself, even though I agree that AI is what has made this issue much
>>> more pronounced. It is now much easier to generate PRs that look ready for
>>> review on the surface, even when the author has not really gone through the
>>> content carefully themselves.
>>>
>>> Because of that, I think it may be more useful to frame any additional
>>> guidance around the quality and readiness of the contribution, rather than
>>> around AI use by itself. That feels like a more durable way to set the
>>> standard, since it focuses on things we can actually assess consistently in
>>> review, rather than trying to determine how the content was produced.
>>>
>>> On that note, one practical place to start might be to have a more
>>> formal guideline around when a PR should be marked draft versus ready for
>>> review. I think a positive direction for the community would be to
>>> strengthen contributor judgment around what it means for a PR to actually
>>> be ready for reviewer attention, even if the change looks substantial on
>>> the surface. We already have a fairly simple mention of the draft PR
>>> process [2], and maybe that is a natural place to clarify our standard for
>>> what should be labeled ready for review.
>>>
>>> I also think that kind of guideline would be constructive for someone
>>> who is misreading the readiness of generated code. It gives them a clear
>>> way to adjust their behavior going forward, without making the first
>>> response a punishing one. If we start from an assumption of good intent,
>>> that seems like a better way to help contributors build stronger judgment
>>> over time.
>>>
>>> If the same pattern keeps repeating after that, then I think it makes
>>> sense to handle it as a contribution-process issue, regardless of whether
>>> generative tooling was involved. That may also be worth clarifying, and it
>>> aligns with your question about limiting contributions from people who
>>> repeatedly ignore these guidelines, although I hope clearer standards help
>>> avoid getting to that point.
>>>
>>> Cheers,
>>> Sung
>>>
>>> [1] https://github.com/apache/iceberg/pull/15213
>>> [2] https://iceberg.apache.org/contribute/#pull-request-process
>>>
>>> On 2026/03/10 00:52:43 huaxin gao wrote:
>>> > Hi everyone,
>>> >
>>> > Some recent PRs look like they were made entirely by AI: finding
>>> issues,
>>> > writing code, opening PRs, and replying to review comments, with no
>>> human
>>> > review and no disclosure.
>>> >
>>> > Our guidelines already say contributors are expected to understand
>>> their
>>> > code, verify AI output before submitting, and disclose AI usage. The
>>> > problem is there's nothing about what happens when someone ignores
>>> them.
>>> >
>>> > Should we define consequences? For example:
>>> >
>>> >
>>> >    - Closing PRs that were clearly not reviewed by a human before
>>> submitting
>>> >    - Limiting contributions from people who repeatedly ignore these
>>> >    guidelines
>>> >
>>> > It's OK to use AI to help write code, but submitting AI output without
>>> > looking at it and leaving it to maintainers to catch the problems is
>>> not
>>> > OK.
>>> >
>>> > What do you all think?
>>> >
>>> > Thanks,
>>> >
>>> > Huaxin
>>> >
>>>
>>

Re: [DISCUSS] Enforcing AI contribution guidelines

Reply via email to