+1 for adding the automation considering that other projects are doing this
as well and the high amount of open PRs doesn't help. Thanks for
summarizing the current state in such detail.

X + Y should probably be larger than our usual release cycles. X=6m and
Y=3m might be alright in this regard.

Matthias

On Tue, Dec 17, 2024 at 10:40 PM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hey!
>
> Big +1
>
> I think this is a great initiative and we should follow the path laid down
> by other projects.
> Starting with X=1 year and Y=3 months is very forgiving, I personally
> wouldn't mind immediately going more aggressive to clean up the backlog.
>
> So for me X = 6m Y = 3m is also perfectly fine.
>
> Cheers,
> Gyula
>
> On Tue, Dec 17, 2024 at 9:06 PM Jim Hughes <jhug...@confluent.io.invalid>
> wrote:
>
> > Hi Tom,
> >
> > I like the idea of starting with X=1 year and Y=3 months.  That plus
> having
> > CHI review recent PRs would be helpful.
> >
> > Your suggestion of having a bot which "lints" naming conventions could be
> > useful.  Maybe there could be a CHI-bot to automate what you and others
> are
> > doing manually. :)
> > I would not suggest that this bot close PRs, but rather it could suggest
> > how to improve them.  Or if CI is green and other conditions are met, it
> > could apply a "ready for review" label to draw attention.
> >
> > Cheers,
> >
> > Jim
> >
> >
> >
> > On Tue, Dec 17, 2024 at 11:21 AM Tom Cooper <c...@tomcooper.dev> wrote:
> >
> > > TL;DR
> > >
> > > - We have over 1.2k open PRs, this is an issue as it makes new
> > > contributors think twice about committing and looks like a problem that
> > is
> > > too-big-to-solve for committers.
> > > - There have been various attempts, over the last 6 years, to enable
> the
> > > Stale PR bot/action to prompt authors to refresh old PRs and auto-close
> > > them if no action is taken.
> > > - These were rejected as some committers felt this was punishing
> > > contributors for the committers not reviewing/closing PRs fast enough.
> > > - Others felt that, rather than "sweeping the problem under the rug",
> > > using the Stale PR functionality would actually reveal the true scale
> of
> > > the issue. Allowing committers to see what were truly active PRs.
> > > - Other Apache projects such as Kafka, Beam, Spark, Airflow and many
> > > others have enabled the stale PR GitHub action.
> > > - Despite this, Kafka still has 1k open PRs. However, these PRs have
> all
> > > been updated/commented on in the last 3-4 months, so can be considered
> > > active.
> > > - For Flink, only 12% of the open PRs have been updated in the last 3
> > > months and only 41% in the last year.
> > > - I propose we enable the Stale PR Github action to clear the backlog
> and
> > > reduce the PRs down to those that are active and relevant.
> > > - We can start with PRs that haven't been active in the last year and
> > give
> > > authors 3 months to refresh them. These thresholds could then be
> reduced
> > > over time, towards the norm for other Apache projects, of 3 months
> > > inactivity and 1 month to refresh.
> > >
> > > The Problem
> > >
> > > Currently, we have 1245 open PRs in the main upstream Flink GitHub
> > > repository. The oldest of which was created over seven and half years
> > ago.
> > > Many of these PRs haven't been commented on or interacted with in
> years.
> > >
> > > I am definitely not here to cast blame. Flink is a huge project, the
> > > committers are volunteers and only have so much time. Also, Flink is
> > > certainly not the only open source project to face this issue. However,
> > the
> > > large number of open PRs is a drag on the community, it makes new
> > > contributors think twice about opening PRs and I am sure it is
> > demoralising
> > > for committers to see the mountain keep growing.
> > >
> > > Dealing with this was a big part of why the [Community Health
> Initiative
> > > (CHI)](
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Community+Health+Initiative+%28CHI%29+workgroup
> > )
> > > working group was set up. We are [making progress](
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332500010
> > )
> > > on reviewing and triaging the top of the PR stack. However, the bottom
> of
> > > the stack is also an issue.
> > >
> > > Background
> > >
> > > It seems reasonable that a PR that is the better part of a decade old
> and
> > > hasn't been commented on in years, is probably not relevant and could
> be
> > > closed. Indeed, this very point has been brought up before, first in
> > [2018](
> > > https://lists.apache.org/thread/qpxkm7s0t6qy571bh14lr26jmos6plfp)
> > [where](
> > > https://lists.apache.org/thread/k11krmyr5rbclw7jhcmshp828fov3w0n) it
> was
> > > commented that:
> > >
> > > > The current situation with 350 open PRs may send a signal to
> > > contributors that it may actually be too much hassle to get a change
> > > committed in Flink.
> > >
> > > At that time, there was some [push-back](
> > > https://lists.apache.org/thread/6qlytq81zrctbv9kbk4z30vkngxvw9o1) to
> the
> > > proposal of using the stale PR bot. Mostly around auto-closing the PRs
> > > being perceived as harsh, given that the issue was mostly due to lack
> of
> > > committer review. The Beam community went ahead and enabled it, but the
> > > discussion on the Flink side seems to have then died out.
> > >
> > > The stale PR bot was raised again in [2019](
> > > https://lists.apache.org/thread/9gz363224zn48n03yffgsczm8gk3ohro) and
> > had
> > > a lot of support, including several examples of other Apache projects
> > using
> > > it to good effect. However, this was again [pushed back against](
> > > https://lists.apache.org/thread/z5r5sy5j98yb8p02l243gc2qlo267d2w) as
> > > hiding the symptoms of the underlying problem, namely committers not
> > > engaging actively enough to close PRs that were no longer relevant or
> had
> > > no hope of being merged. The c[ounter argument](
> > > https://lists.apache.org/thread/rz6vbzs16jbrs4smhqs940g3cg6godl0) to
> > this
> > > was that the PR closing bot was only one part of a solution, not the
> > whole
> > > solution and that far from hiding the problem, the stale labelling
> would
> > > highlight the scale of the issue.
> > >
> > > The Stale PR closing issue was raised further in [2022](
> > > https://lists.apache.org/thread/2z51fxm201z1tnvm2jklmtrhox4t9olw) and
> > > [2023](
> https://lists.apache.org/thread/pml95msx21sdc539404xs9tk209sdd55)
> > > with similar arguments. Including from CHI's own [David Radley](
> > > https://lists.apache.org/thread/0poktkjxhdpv15s0xgj7oksr7209w737):
> > >
> > > > We have over 1000 open prs. This is a lot of technical debt. I came
> > > across a 6 month old pr recently that had not been merged.
> > > > A second Jira issue was raised for the same problem and a second pr
> > > fixed the issue (identically). The first pr was still on the backlog
> > until
> > > we noticed it.
> > >
> > > What other Apache projects are doing
> > >
> > > The [Stale PR/Issue GitHub action](https://github.com/actions/stale)
> is
> > > used by many Apache projects including [Beam](
> > > https://github.com/apache/beam/blob/master/.github/workflows/stale.yml
> ),
> > > [Kafka](
> > > https://github.com/apache/kafka/blob/trunk/.github/workflows/stale.yml
> ),
> > > [Spark](
> > >
> https://github.com/apache/spark/blob/master/.github/workflows/stale.yml)
> > > and [Airflow](
> > >
> https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml)
> > > to name a few.
> > >
> > > Apache Kafka uses a 90 day (3 months) limit to define a stale PR and
> then
> > > allows a further 30 days (1 month) for the author to refresh the PR
> > before
> > > it is auto-closed. Even with the Stale PR action enabled, Kafka still
> has
> > > over 1000 open PRs. However, all of these PRs were updated, commented
> on
> > or
> > > otherwise interacted with in the last 3-4 months. This gives a much
> > better
> > > overview of the scale of the open PR base.
> > >
> > > For comparison, I did some [basic analysis](
> > > https://gist.github.com/tomncooper/d92985065e948fb8fc6284a5438ff195)
> of
> > > Flink's open PRs. 55% were updated in the last 2 years, 41% in the last
> > > year, 12% in the last 3 months and only 8% in the last month. It is
> > > reasonable to conclude that over half these PRs are probably not
> relevant
> > > anymore or need significant updates to be compatible.
> > >
> > > What should we do?
> > >
> > > My personal take on this is that, while I agree that the issue is
> mostly
> > > one of committer capacity to review these PRs, the upstream PR count is
> > > currently too high. It is discouraging engagement. But, it is also not
> > fair
> > > to blame committers for not wanting to spend time on PRs that are years
> > out
> > > of date and clearly not relevant anymore.
> > >
> > > So I think we should declare PR bankruptcy and attempt to clear away
> the
> > > bulk of the old PRs. I don't use the word "bankruptcy" flippantly or to
> > > provoke, just to acknowledge that the scale of the issue has gotten too
> > > large to be dealt with through the hard work of committers alone. Once
> we
> > > get the PR backlog to a manageable size, we can then focus on using
> > > initiatives like CHI and other other workflow improvements to keep the
> PR
> > > count low.
> > >
> > > Proposal
> > >
> > > Enable the [stale PR GitHub action](https://github.com/actions/stale).
> > > This action would:
> > >
> > > - Identify any PR that has not been interacted with in the last `X`
> > months
> > > as `Stale`:
> > >
> > > - Apply a `Stale` label to the PR
> > > - Comment on the PR that it is considered `Stale` and what to do to
> > > refresh it and how to engage further with the community. This will also
> > > allow committers to easily get a list of stale PRs to review and
> > > refresh/close.
> > > - Identify any `Stale` PR that hasn't been refreshed (commented on or
> > > otherwise updated) after a further `Y` months as closeable.
> > >
> > > - Close the PR.
> > > - Leave a closing comment highlighting that it can be reopened at any
> > > point with pointers to how to engage the community.
> > >
> > > The values of the stale (`X`) and close (`Y`) thresholds is up for
> > > discussion. At least initially, given the shear number of old PRs, we
> may
> > > want to be more lenient. For example X = 1 year, Y = 3 months, would
> > limit
> > > the initial number of stale PRs and allow committers more time to
> review
> > > the stale PR list. Once the PR list has been reduced sufficiently we
> may
> > > want to reduce these values in increments until, for example, X = 3
> > months
> > > and Y = 1 month which seems to be the values other Apache projects have
> > > settled on.
> > >
> > > Obviously, I am a relative newcomer to the community. I would really
> like
> > > to hear what others, especially committers, think of the above proposal
> > and
> > > hear any other ideas people have for taming the PR count.
> > >
> > > Alternatives
> > >
> > > Looking through the history of discussion on the subject, on several
> > > occasions people have suggested doing more fine grained checks before
> > > closing PRs, [such as](
> > > https://lists.apache.org/thread/j8s4khdm4wdhvr2px6qkq0f0kwlpc2vg):
> > >
> > > > closing up PRs after X days which:
> > > > a) Don't have a CI that has passed
> > > > b) Don't follow the code contribution guide (like commit naming
> > > conventions)
> > > > c) Have changes requested but aren't being followed-up by the
> > contributor
> > >
> > > This is of course an option, but would probably require updating
> > FlinkBot.
> > > There is no reason we couldn't enable both the Stale PR GitHub Action
> and
> > > update Flinkbot to enforce rules like those above.
> > >
> > > Tom Cooper
> > > [@tomncooper](https://twitter.com/tomncooper) | [tomcooper.dev](
> > > https://tomcooper.dev/)
> >
>

Reply via email to