Re: [DISCUSS] Triage + review AI assistance - human- or not human- assisted triage / review ?

Pierre Jeambrun Wed, 29 Apr 2026 02:48:22 -0700

I think that for PRs that have been sitting for a while waiting for a
review, fully automatic review (copilot) is better than nothing and enables
the contributor to keep moving forward instead of remaining blocked. I
believe that's a good thing.


Copilot reviews for iterations (2nd, 3rd, 4th pass), I find them bad and
discouraging, I would not advise this. Mostly because they contain a lot of
comments, many nits and irrelevant ones (out of scope, plain wrong etc.).
This means that after the initial fully automated review, some real human
needs to come in and do an actual review to help the contributor move
forward with their PR.

Personally I use a setup similar to Kaxil, with my own skills that will do
a review paying attention to what I usually pay attention to. (Learning on
all my reviews in the past year and creating a skill based on that, with
rejection patterns etc...). Then I review every comment and either post
them directly or rewrite them if they sound too AI.

I'm really for AI-generated reviews, but presented in a form that looks
'human' and has some empathy. Real people might read that. I believe rare
are the people using a fully 100% AI-automated process to
create a PR; at least for double-checking that stuff is correctly
addressed, a real human will read those comments. This is why I believe
it's important to keep them 'human' in the form of communication, even if
the content was AI-generated (which is the point) — wording, verified
comments are relevant, trimmed hallucinations, etc.


And this 'human' form of communication really doesn't cost much, because
you still have all the comments AI-generated. If you ask your agent to
'speak' like you, it will get closer to an acceptable form of
communication. And then rewriting a couple of comments is really simple.
The part that takes some effort is reading and understanding all the
comments to make sure they are relevant — posting AI-generated irrelevant
comments is super frustrating and discouraging for the contributor, it's
adding noise, and since we require a 'resolve all comments' policy, we
basically force them to do the job for us by going through all of them,
removing the hallucinations, etc... and answering unverified messages from
the AI.

My 2 cents.

On Tue, Apr 28, 2026 at 5:48 PM Jarek Potiuk <[email protected]> wrote:

> > Also that the price will keep on changing :D
>
> I hope not in the same way as those charts (though ... it's quite possible
> actually):
>
>
> https://github.blog/news-insights/company-news/an-update-on-github-availability/
>
> J.
>
>
> On Tue, Apr 28, 2026 at 5:17 PM Kaxil Naik <[email protected]> wrote:
>
> > Also that the price will keep on changing :D
> >
> > On Tue, 28 Apr 2026 at 16:15, Kaxil Naik <[email protected]> wrote:
> >
> > > 100% Agreed on it for sure.
> > >
> > > >Regarding Copilot billing, the lack of clarity on server-side costs is
> > > concerning.
> > >
> > > On Tue, 28 Apr 2026 at 16:13, Jarek Potiuk <[email protected]> wrote:
> > >
> > >> Hi Kaxil,
> > >>
> > >> Regarding Copilot billing, the lack of clarity on server-side costs is
> > >> concerning. Given the recent surge in PR volume, there is a risk of
> > >> uncontrollable expenses if hard caps aren't in place, similar to those
> > on
> > >> ASF CI.
> > >>
> > >> I agree that automated and assisted reviews are not mutually
> exclusive.
> > I
> > >> am interested to hear how others perceive this balance and what their
> > >> experiences have been.
> > >>
> > >> Best,
> > >> Jarek Potiuk
> > >>
> > >> On Tue, Apr 28, 2026 at 2:20 PM Kaxil Naik <[email protected]>
> wrote:
> > >>
> > >> > Ignore the last para about links -- it is the copy/paste of what I
> had
> > >> sent
> > >> > to the new AI initiative ASF group/list where some of related
> > >> discussions
> > >> > were happening.
> > >> >
> > >> > On Tue, 28 Apr 2026 at 13:18, Kaxil Naik <[email protected]>
> wrote:
> > >> >
> > >> > > The Copilot reviews as I had recently found out were paid for by
> our
> > >> > > Astronomer's GitHub enterprise (for me and other folks at
> Astronomer
> > >> from
> > >> > > our quota).
> > >> > >
> > >> > > And with them moving to Usage-based model (which was bound to
> > >> happen), it
> > >> > > will get expensive.
> > >> > >
> > >> > > Although, I think it is still valuable for mass-reviewing on the
> > >> server
> > >> > > side since this happens on CI and is a complete opt-in by the
> > reviewer
> > >> > and
> > >> > > there are no doubts.
> > >> > >
> > >> > > As I showed in the last dev call, I have my hand crafted review
> > skill
> > >> > that
> > >> > > I use for detailed reviews from my laptop and that has been vetted
> > by
> > >> me
> > >> > > before posting. So I am fully responsible for all the good and bad
> > >> (false
> > >> > > positive or hallucinations) things it catches since I approve it.
> > And
> > >> > have
> > >> > > been using this skill for a good quarter or half a year (time
> > flies).
> > >> > >
> > >> > > And that is why I do not feel it is either / OR -- meaning it was
> > >> never
> > >> > > Copilot review vs local review for me since I used both based on
> the
> > >> > > purpose. For reviewing 200 PRs as last time, I used Copilot since
> a
> > >> > review
> > >> > > is helpful than no review, and PR getting marked stale and I have
> > seen
> > >> > > folks self-assign copilot review on their PRs -- for those who
> have
> > >> > access
> > >> > > to it. I have done the same to have multiple layers (even though
> my
> > >> local
> > >> > > review skill already does multi-modal reviews).
> > >> > >
> > >> > > And my philosophy around Review and a lot of workflow skills have
> > >> been:
> > >> > > What I look for in a PR isn't necessarily what someone else would
> > look
> > >> > for.
> > >> > > Or what is important to me, might be nit for someone. So while as
> a
> > >> > project
> > >> > > we should have standards which go in AGENTS.md/Claude.md and or a
> > >> > > high-level review skill that is checked-in the project, the "what
> I
> > >> look
> > >> > > for" will remain on my machine and is geared towards my
> preference,
> > >> > testing
> > >> > > etc. Time and again I'd use reviews from Ash and others to tune it
> > --
> > >> > like
> > >> > > I would do even without AI. As an example Ash would have caught
> > >> something
> > >> > > in my PR or someone else's PR that I wouldn't have realized, and
> > like
> > >> a
> > >> > > human I'd learn for next time, so are my skills. But that is sort
> of
> > >> my
> > >> > new
> > >> > > workflow adapting.
> > >> > >
> > >> > > re: fully automated Copilot reviews may discourage contributors
> who
> > >> > expect
> > >> > > human interaction
> > >> > >
> > >> > > At least to me, there is no difference between that and a review
> > from
> > >> a
> > >> > > human which completely sounds AI/robotic anyway since a human can
> > just
> > >> > run
> > >> > > a skill and post the response as well --- from a purely PR author
> > >> > reception
> > >> > > point of view. It is just coming from a different account, and the
> > >> latter
> > >> > > feels even worse as it is coming from a human account. But at the
> > end
> > >> of
> > >> > > the day, that is still a personal preference. A review (from
> > copilot,
> > >> > from
> > >> > > human that sounds like AI, a pure human review) that catches a bug
> > is
> > >> > still
> > >> > > better than no review, PR going stale and being closed.
> > >> > >
> > >> > > Long story short: I do not think they are mutually exclusive -- or
> > >> never
> > >> > > were mutually exclusive :)
> > >> > >
> > >> > > ------
> > >> > >
> > >> > > -
> > https://github.com/apache/airflow/pull/63775#discussion_r3025383633
> > >> --
> > >> > > Copilot caught a Databricks provider importing
> > airflow.utils.timezone
> > >> > > directly (which relies on Airflow's runtime deprecation redirect
> and
> > >> > > silences typing) and suggested switching to
> > >> > > airflow.providers.common.compat.sdk. That is our documented
> Airflow
> > 2
> > >> /
> > >> > > Airflow 3 cross-version pattern for providers. Copilot only knows
> > this
> > >> > > because we wrote it down.
> > >> > > -
> > https://github.com/apache/airflow/pull/62343#discussion_r3025380683
> > >> --
> > >> > > same cross-version import pattern, different provider. Author
> > accepted
> > >> > the
> > >> > > suggestion.
> > >> > > -
> > https://github.com/apache/airflow/pull/64568#discussion_r3025333917
> > >> --
> > >> > > Copilot flagged a fix for failure-callback context["exception"]
> > >> handling
> > >> > > and asked for a regression test specifically against the
> > >> > > InProcessTestSupervisor / dag.test() path. That is the correct
> > >> execution
> > >> > > path for that change, not a generic "please add a test" remark.
> > >> > > -
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/airflow/pull/64576#pullrequestreview-4047787083
> > >> > --
> > >> > > Copilot reviewed a fix I authored for xcom_pull() ignoring default
> > >> when
> > >> > > map_indexes was not set, and raised a header-precedence
> > >> > > question I had not thought about.
> > >> > > -
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/airflow/pull/61878#pullrequestreview-3851732779
> > >> > --
> > >> > > Dennis surfaced this one on the dev list as a real review he found
> > >> useful
> > >> > > on a provider PR. He's at a
> > >> > > different company, so worth flagging as an independent signal.
> > >> > >
> > >> > > On Tue, 28 Apr 2026 at 11:02, Jarek Potiuk <[email protected]>
> > wrote:
> > >> > >
> > >> > >> Hi Kaxil and team,
> > >> > >>
> > >> > >> I’d like to discuss our strategy for "copilot reviews" versus
> SKILL
> > >> > based
> > >> > >> assisted triage and maintainer review experiences.
> > >> > >>
> > >> > >> My recent experiments with skill-based auto-triage show a
> downward
> > >> trend
> > >> > >> in
> > >> > >> stale pull requests by filtering out "drive-by" contributions,
> > >> allowing
> > >> > >> maintainers to focus on high-impact work (more stats and trends
> > >> soon).
> > >> > >> Unlike 3rd-party tools, this approach is agent-agnostic, easily
> > >> > fine-tuned
> > >> > >> via English prompts, and potentially reusable across other ASF
> > >> projects.
> > >> > >>
> > >> > >> Regarding transparency, I’ve proposed specific attribution
> formats
> > >> [1]
> > >> > to
> > >> > >> distinguish between purely automated comments and those reviewed
> by
> > >> > >> humans.
> > >> > >> This aligns with ongoing [email protected] conversations
> > [2]
> > >> > about
> > >> > >> using "Assisted By" instead of "Generated-by" to maintain
> > >> accountability
> > >> > >> and contributor motivation.
> > >> > >>
> > >> > >> I have concerns that fully automated Copilot reviews may
> discourage
> > >> > >> contributors who expect human interaction. My experimental
> > >> > >> /maintainer-review skill [3] allows maintainers to drive the
> > process
> > >> by
> > >> > >> reviewing AI-generated comments before posting. This reduces
> "token
> > >> > >> ping-pong," integrates with the CODEOWNERS discussion [4], and
> > >> ensures
> > >> > we
> > >> > >> only spend tokens on relevant, triaged PRs. And keep maintainers
> > >> > >> ultimately
> > >> > >> responsible for comments they accept to send.
> > >> > >>
> > >> > >> With Copilot moving to usage-based billing on June 1 [5], using
> > >> personal
> > >> > >> or
> > >> > >> open-source credits via the skill-based approach introduces some
> > >> change
> > >> > >> (but I am still unclear what it means to the billing - who pays
> for
> > >> the
> > >> > >> tokens).
> > >> > >>
> > >> > >> Conversely, for skill-based processes, it's entirely in the hands
> > >> (and
> > >> > >> pocket) of those who run the skill locally. We are also working
> > with
> > >> ASF
> > >> > >> and Alpha-Omega to secure long-term resources for maintainers for
> > >> that.
> > >> > >>
> > >> > >> I would love to hear your thoughts about it - especially Kaxil's
> > >> > >> experiences regarding the Copilot review experiment - my
> > observations
> > >> > >> might
> > >> > >> be incomplete and biased :).
> > >> > >>
> > >> > >> J.
> > >> > >>
> > >> > >> [1] https://github.com/apache/airflow/pull/65965 - comment
> > >> attribution
> > >> > >> [2]
> > https://lists.apache.org/thread/qbdkky8ls6zybyy9o3pvqnpf68r089qp
> > >> -
> > >> > >> legal discussion thread on attributions
> > >> > >> [3] https://github.com/apache/airflow/pull/65981 -
> > >> /maintainer-review
> > >> > >> skill
> > >> > >> [4]
> > https://lists.apache.org/thread/5ssp4ksyohdzclxqvj7ngz0hz5wy9j68
> > >> -
> > >> > >> CODEOWNERS discussion
> > >> > >> [5]
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
> https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
> > >> > >> - change in billing for Copilot
> > >> > >>
> > >> > >> Best,
> > >> > >> Jarek Potiuk
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] Triage + review AI assistance - human- or not human- assisted triage / review ?

Reply via email to