Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Gustavo de Morais Thu, 16 Apr 2026 10:52:45 -0700

Hi Martijn,

Thanks the discussion and the update, Martijn.


I think a valuable next step will be for contributors with knowledge on
specific modules to gradually start adding the local AGENTS.md file for
their respective modules. I believe this incremental approach will be quite
helpful for providing context and improving knowledge sharing of those
modules in the long term.

Kind regards,

Gustavo

On Thu, 16 Apr 2026 at 17:29, Martijn Visser <[email protected]>
wrote:

> Hi all,
>
> I've opened up https://issues.apache.org/jira/browse/FLINK-39477 given
> that
> there's consensus on getting this in, thank you all for your feedback!
>
> Best regards,
>
> Martijn
>
> On Tue, Mar 24, 2026 at 5:52 AM Samrat Deb <[email protected]> wrote:
>
> > Hi Martijn,
> >
> > +1 for the initiative.
> >
> > I really liked the Iceberg-style guidelines [1]. AI-generated code must
> > face the same strict review standards as human code. The author must take
> > full ownership, explain the "why" behind the logic, and be able to debug
> > it.
> >
> > One word of caution regarding Leonard's idea of a support agent for the
> > user@flink list or Slack. Let's tread very carefully here. The blast
> > radius
> > for a hallucinated configuration, for example, mixing up
> > RocksDBStateBackend and HashMapStateBackend tuning. During a user's
> > production crisis, it is massive and could lead to data loss.
> > If we do build a support bot, it must be strictly constrained by our
> > official docs, maybe RAG-based initially and evolve from there and must
> > contain the right disclaimer.
> >
> > Bests,
> > Samrat
> > [1] https://iceberg.apache.org/contribute/#how-are-proposals-adopted
> >
> > On Mon, Mar 23, 2026 at 7:59 PM Ramin Gharib <[email protected]>
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > +1 from me.
> > >
> > > Thanks for bringing this up. It makes total sense to get ahead of this
> > and
> > > set some clear guardrails as these tools become more popular.
> > >
> > > I really like the AGENTS.md approach. Explicitly laying out
> module-level
> > > context will definitely help reduce the noise from AI-generated PRs.
> > >
> > > Happy to see this move forward!
> > >
> > > Cheers,
> > >
> > > Ramin
> > >
> > > On Mon, Mar 23, 2026 at 2:59 PM Gustavo de Morais <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Thanks for driving this and I'm +1 for the initiative so we share
> > > knowledge
> > > > across the community. I'm also +1 to starting with only the root
> > > AGENTS.md.
> > > > Correct and thoroughly reviewed AGENTS.md should be a follow-up for
> > each
> > > > module. In my experience, a shorter and correct context file is
> better
> > > than
> > > > longer, incorrect/outdated files which create a bad experience using
> > > > agents.
> > > >
> > > >
> > > >
> > > >
> > > > I've done a review for the PR for the things I'm aware of. It'd be
> nice
> > > to
> > > > have other eyes from people with different expertises.
> > > >
> > > >  Kind regards,
> > > >
> > > >
> > > >
> > > >  Gustavo
> > > >
> > > >
> > > > On Mon, 23 Mar 2026 at 12:58, Martijn Visser <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > If there are no more comments, I'll start a vote later this week
> > > > >
> > > > > On Mon, Mar 16, 2026 at 1:22 PM Martijn Visser <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > >  Hi all,
> > > > > >
> > > > > > Thanks for all the feedback and support. I've opened a draft PR
> [1]
> > > > that
> > > > > > covers points 1 and 2 from the original proposal.
> > > > > >
> > > > > > What's in the PR:
> > > > > >
> > > > > > 1. The PR includes an AGENTS.md at the repository root with
> > > > > prerequisites,
> > > > > > build/test commands, repository structure, architecture
> boundaries,
> > > > > common
> > > > > > change patterns, coding standards, testing standards, commit
> > > > conventions,
> > > > > > and boundaries. It also updates the PR template with a dedicated
> AI
> > > > > > disclosure section (checkbox + Generated-by tag).
> > > > > > 2. Module-level AGENTS.md files (point 3) are not (yet) included
> > and
> > > > can
> > > > > > be added incrementally by module maintainers.
> > > > > >
> > > > > > I've used Claude to generate this PR, to show how these tools can
> > > also
> > > > > > help us with these things.
> > > > > >
> > > > > > Let me also respond to the individual points raised.
> > > > > >
> > > > > > @Leonard: Interesting idea about an AI agent for the users'
> mailing
> > > > list,
> > > > > > but I'd think it would also be great if we could integrate it in
> > the
> > > > > Slack
> > > > > > workspace itself for those that are more active there. I think
> > > that's a
> > > > > > separate discussion worth having, but out of scope for this
> > proposal.
> > > > > Would
> > > > > > you like to start a dedicated thread for that?
> > > > > >
> > > > > > @Zakelly: Good point about architecture, performance, and code
> > > > > > reusability. The AGENTS.md includes an "Architecture Boundaries"
> > > > section
> > > > > > and a "Common Change Patterns" section that maps change types to
> > the
> > > > > > modules they affect, which should help steer AI agents in the
> right
> > > > > > direction. Regarding GitHub labels and bot reminders for
> > AI-generated
> > > > > PRs:
> > > > > > I think that's a good idea but would be a separate follow-up. I
> > think
> > > > we
> > > > > > should get the baseline guidelines in place first.
> > > > > >
> > > > > > @Vaquar: Thanks for sharing. I think AGENTS.md and the PR
> template
> > > > > > disclosure are the right starting point for Flink. Deterministic
> > > > > > build-system gates are an interesting idea, but I'd want to see
> how
> > > the
> > > > > > community's experience with AI contributions evolves before
> adding
> > > that
> > > > > > level of enforcement. If you'd like to propose something concrete
> > for
> > > > > > Flink, a FLIP would be the right vehicle for that.
> > > > > >
> > > > > > Process question:
> > > > > >
> > > > > > Since these are contribution guidelines rather than API or
> > > architecture
> > > > > > changes, I think a vote on this thread would be sufficient. But
> if
> > > the
> > > > > > community feels this warrants a formal FLIP, I'm happy to go that
> > > > route.
> > > > > > What do others think?
> > > > > >
> > > > > > Feedback on the PR is welcome.
> > > > > >
> > > > > > Thanks, Martijn
> > > > > >
> > > > > > [1] https://github.com/apache/flink/pull/27776
> > > > > >
> > > > > > On Sat, Mar 14, 2026 at 6:13 AM vaquar khan <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Martijn, Zakelly, and everyone,
> > > > > >>
> > > > > >> +1 to adding AGENTS.md. It's a great first step
> > > > > >> as all other Apache projects follow the same approach.
> > > > > >>
> > > > > >> I saw this thread and thought I'd chime in because I'm actually
> > > > working
> > > > > on
> > > > > >> a draft KIP proposal  on this exact topic right now.
> > > > > >>
> > > > > >> To Zakelly's point about AI falling short on architecture:
> > AGENTS.md
> > > > is
> > > > > a
> > > > > >> great guide, but it’s ultimately a "soft control." In my
> > experience,
> > > > > LLMs
> > > > > >> probabilistically ignore markdown instructions when their
> context
> > > > > windows
> > > > > >> fill up or prompts drift.
> > > > > >>
> > > > > >> To really stop the review fatigue, my KIP draft proposes adding
> a
> > > > > >> deterministic "hard control" hooked directly into the build
> > system.
> > > It
> > > > > >> uses
> > > > > >> local AST parsing to automatically block PRs that are mostly
> empty
> > > > > >> scaffolding/docstrings (low logic density) or violate core
> > > > architectural
> > > > > >> patterns. It catches the "AI slop" before a human ever has to
> look
> > > at
> > > > > it.
> > > > > >>
> > > > > >> If the community is interested, I’d be happy to share my draft
> > KIP.
> > > It
> > > > > >> might be a helpful reference if we want to explore a similar
> > > > Maven-based
> > > > > >> gate for Flink.
> > > > > >>
> > > > > >> Regards,
> > > > > >>
> > > > > >> Vaquar Khan
> > > > > >>
> > > > > >> On Thu, Mar 12, 2026 at 9:57 PM Zakelly Lan <
> > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi, Martjin,
> > > > > >> >
> > > > > >> > Thanks for bringing this up. I'd +1 on this proposal.
> > > > > >> >
> > > > > >> > In the guidelines, I'd like to emphasize that contributors and
> > > > > reviewers
> > > > > >> > should pay particular attention to architecture, performance,
> > and
> > > > code
> > > > > >> > reusability. Based on my experience working with AI, code
> agents
> > > > often
> > > > > >> fall
> > > > > >> > short in these.
> > > > > >> >
> > > > > >> > And furthermore, I suggest we introduce mechanisms to ensure a
> > > > smooth
> > > > > >> > review process for AI-generated code, such as adding github
> > labels
> > > > > and a
> > > > > >> > special reminder for reviewers from the flink's github bot.
> > > > > >> >
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Zakelly
> > > > > >> >
> > > > > >> >
> > > > > >> > On Fri, Mar 13, 2026 at 10:09 AM Rion Williams <
> > > > [email protected]
> > > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hi Martijn,
> > > > > >> > >
> > > > > >> > > I think this is a great idea and definitely an effort worth
> > > > > pursuing —
> > > > > >> > > it’s actually something I’ve been considering experimenting
> > with
> > > > > >> myself.
> > > > > >> > A
> > > > > >> > > clear +1 from me, and I’d be happy to help as the effort
> > > develops.
> > > > > >> > >
> > > > > >> > > On the reviewer side, we already have a pretty solid set of
> > > > > guardrails
> > > > > >> > and
> > > > > >> > > review processes in place, which is great. That said, it’s
> > still
> > > > > easy
> > > > > >> to
> > > > > >> > > become inundated by a large, random PR with little or no
> > context
> > > > > >> > (sometimes
> > > > > >> > > clearly AI-driven). Establishing some guidelines
> specifically
> > > > around
> > > > > >> AI
> > > > > >> > > usage — both for providing development context and for
> helping
> > > > with
> > > > > >> the
> > > > > >> > > review/audit process — would be fantastic, even if we start
> > > small
> > > > > and
> > > > > >> > > gradually evolve things over time.
> > > > > >> > >
> > > > > >> > > Thanks for kicking this off. Looking forward to hearing what
> > > > others
> > > > > >> > think.
> > > > > >> > >
> > > > > >> > > Cheers,
> > > > > >> > >
> > > > > >> > > Rion
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > > On Mar 12, 2026, at 8:50 PM, Leonard Xu <
> [email protected]>
> > > > > wrote:
> > > > > >> > > >
> > > > > >> > > > Hi Martijn,
> > > > > >> > > >
> > > > > >> > > > Thanks for kicking off this discussion. I've been thinking
> > > along
> > > > > >> > similar
> > > > > >> > > lines recently, so you have a +1 from me on this proposal.
> > > > > >> > > >
> > > > > >> > > > I also have a suggestion regarding activity on the users'
> > > > mailing
> > > > > >> list.
> > > > > >> > > Could we consider introducing an AI agent to help answer
> > users'
> > > > > >> > questions?
> > > > > >> > > I've noticed that many inquiries on user@flink currently go
> > > > > >> unanswered,
> > > > > >> > > yet most of them could be effectively addressed by an agent.
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Leonard
> > > > > >> > > >
> > > > > >> > > >> 2026 3月 13 05:03，Martijn Visser <
> [email protected]>
> > > 写道：
> > > > > >> > > >>
> > > > > >> > > >> Hi all,
> > > > > >> > > >>
> > > > > >> > > >> I'd like to start a discussion about how the Flink
> > community
> > > > > should
> > > > > >> > > handle
> > > > > >> > > >> AI-assisted contributions and how we can make the Flink
> > > > codebase
> > > > > >> more
> > > > > >> > > >> accessible to AI tooling.
> > > > > >> > > >>
> > > > > >> > > >> The ASF has published guidance on generative AI tooling
> > [1],
> > > > and
> > > > > >> > several
> > > > > >> > > >> Apache projects have already adopted project-specific
> > > > guidelines
> > > > > on
> > > > > >> > top
> > > > > >> > > of
> > > > > >> > > >> that. I think Flink should too.
> > > > > >> > > >>
> > > > > >> > > >> The most comprehensive example I've seen is Apache
> Airflow.
> > > > > They've
> > > > > >> > > added
> > > > > >> > > >> an AGENTS.md [2] with instructions for AI coding agents,
> > > > > including
> > > > > >> PR
> > > > > >> > > >> templates with an AI disclosure checkbox, a self-review
> > > > > checklist,
> > > > > >> and
> > > > > >> > > the
> > > > > >> > > >> Generated-by: commit message token that the ASF guidance
> > > > > >> recommends.
> > > > > >> > > Apache
> > > > > >> > > >> Iceberg recently adopted AI contribution guidelines [3]
> > > focused
> > > > > on
> > > > > >> > > >> contributor accountability: you must be able to debug,
> > > explain,
> > > > > and
> > > > > >> > own
> > > > > >> > > the
> > > > > >> > > >> changes. Other projects like Paimon [4], Mahout [5], and
> > > Ozone
> > > > > [6]
> > > > > >> > have
> > > > > >> > > >> adopted similar policies.
> > > > > >> > > >>
> > > > > >> > > >> I'd like to propose the following for Flink:
> > > > > >> > > >>
> > > > > >> > > >> 1. Adopt contribution guidelines for AI-assisted PRs.
> > > > > Contributors
> > > > > >> > must
> > > > > >> > > >> disclose when AI tooling was used (using Generated-by:
> > <Tool
> > > > Name
> > > > > >> and
> > > > > >> > > >> Version> in the commit message), and must be able to
> > explain
> > > > and
> > > > > >> take
> > > > > >> > > >> ownership of all changes. AI-generated code is held to
> the
> > > same
> > > > > >> review
> > > > > >> > > >> standards as human-written code.
> > > > > >> > > >> 2. Add AGENTS.md files to the Flink repository. AGENTS.md
> > [7]
> > > > is
> > > > > a
> > > > > >> > > >> convention for giving AI coding agents project-specific
> > > > context.
> > > > > It
> > > > > >> > can
> > > > > >> > > >> contain information like build instructions, test
> commands,
> > > > > coding
> > > > > >> > > >> conventions, commit message format. I think we should add
> > one
> > > > at
> > > > > >> the
> > > > > >> > > root
> > > > > >> > > >> of apache/flink.
> > > > > >> > > >> 3. Add module-level context for AI tooling. This is
> where I
> > > > think
> > > > > >> we
> > > > > >> > can
> > > > > >> > > >> take a step forward. Each Flink module (e.g.
> > > > > flink-streaming-java,
> > > > > >> > > >> flink-table-planner, flink-clients) would benefit from
> its
> > > own
> > > > > >> > AGENTS.md
> > > > > >> > > >> explaining the module's role, key abstractions, testing
> > > > patterns,
> > > > > >> and
> > > > > >> > > >> common pitfalls. This also serves as architectural
> > > > documentation
> > > > > >> that
> > > > > >> > > helps
> > > > > >> > > >> human contributors.
> > > > > >> > > >>
> > > > > >> > > >> I'm looking forward to hearing what others think about
> > this.
> > > > > >> > > >>
> > > > > >> > > >> Best regards,
> > > > > >> > > >>
> > > > > >> > > >> Martijn
> > > > > >> > > >>
> > > > > >> > > >> [1] https://www.apache.org/legal/generative-tooling.html
> > > > > >> > > >> [2]
> https://github.com/apache/airflow/blob/main/AGENTS.md
> > > > > >> > > >> [3]
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
> > > > > >> > > >> [4]
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/paimon/blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1#L22
> > > > > >> > > >> [5]
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/mahout/blob/main/docs/community/pr-policy-and-review-guidelines.md
> > > > > >> > > >> [6]
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/ozone-site/blob/master/src/pages/release-notes/2.0.0.md?plain=1#L408
> > > > > >> > > >> [7] https://agents.md/
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Reply via email to