Huge +1 to this. 

I think X=6m and Y=3m makes sense, as something as simple as adding a comment 
refreshes it. As such, slightly longer than a normal Flink release cycle is a 
good line in the sand (as Matthias has said)

Side note : Just thinking of a typical PR review process, PRs either fall into 
Proposer do something, or Reviewer do something. Stale PR bot puts the onus on 
the Proposer to do something. I wonder if we should create a label like “Needs 
Reviewer” to help filter PRs that are currently waiting for Reviewer, so 
Reviewers can easily filter PRs where we do something!


Hong

> On 7 Jan 2025, at 23:21, Tom Cooper <c...@tomcooper.dev> wrote:
> 
> 
> Hi Rui,
> 
> Good point about recording the proposal. I am not sure it fits the FLIP 
> process so I made a page [1] under the CHI workgroup.
> That page has a summary of the proposal and links to the discussion and vote 
> threads.
> 
> Thanks,
> 
> Tom Cooper
> @tomncooper | tomcooper.dev
> 
> [1] https://cwiki.apache.org/confluence/display/FLINK/Stale+PR+Cleanup
> 
>> On Tuesday, 7 January 2025 at 06:13, Rui Fan <1996fan...@gmail.com> wrote:
>> 
>> Thanks Tom for driving this discussion!
>> 
>> It sounds make sense to me, could you help create a FLIP or doc
>> to record it? It's helpful to let other developers check the stale strategy
>> and background in the future, thank you.
>> 
>> Best,
>> Rui
>> 
>>> On Wed, Dec 18, 2024 at 5:16 PM Matthias Pohl map...@apache.org wrote:
>>> 
>>> +1 for adding the automation considering that other projects are doing this
>>> as well and the high amount of open PRs doesn't help. Thanks for
>>> summarizing the current state in such detail.
>>> 
>>> X + Y should probably be larger than our usual release cycles. X=6m and
>>> Y=3m might be alright in this regard.
>>> 
>>> Matthias
>>> 
>>>> On Tue, Dec 17, 2024 at 10:40 PM Gyula Fóra gyula.f...@gmail.com wrote:
>>> 
>>>> Hey!
>>>> 
>>>> Big +1
>>>> 
>>>> I think this is a great initiative and we should follow the path laid
>>>> down
>>>> by other projects.
>>>> Starting with X=1 year and Y=3 months is very forgiving, I personally
>>>> wouldn't mind immediately going more aggressive to clean up the backlog.
>>>> 
>>>> So for me X = 6m Y = 3m is also perfectly fine.
>>>> 
>>>> Cheers,
>>>> Gyula
>>>> 
>>>> On Tue, Dec 17, 2024 at 9:06 PM Jim Hughes <jhug...@confluent.io.invalid
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Tom,
>>>>> 
>>>>> I like the idea of starting with X=1 year and Y=3 months. That plus
>>>>> having
>>>>> CHI review recent PRs would be helpful.
>>>>> 
>>>>> Your suggestion of having a bot which "lints" naming conventions could
>>>>> be
>>>>> useful. Maybe there could be a CHI-bot to automate what you and others
>>>>> are
>>>>> doing manually. :)
>>>>> I would not suggest that this bot close PRs, but rather it could
>>>>> suggest
>>>>> how to improve them. Or if CI is green and other conditions are met,
>>>>> it
>>>>> could apply a "ready for review" label to draw attention.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> On Tue, Dec 17, 2024 at 11:21 AM Tom Cooper c...@tomcooper.dev
>>>>> wrote:
>>>>> 
>>>>>> TL;DR
>>>>>> 
>>>>>> - We have over 1.2k open PRs, this is an issue as it makes new
>>>>>> contributors think twice about committing and looks like a problem
>>>>>> that
>>>>>> is
>>>>>> too-big-to-solve for committers.
>>>>>> - There have been various attempts, over the last 6 years, to enable
>>>>>> the
>>>>>> Stale PR bot/action to prompt authors to refresh old PRs and
>>>>>> auto-close
>>>>>> them if no action is taken.
>>>>>> - These were rejected as some committers felt this was punishing
>>>>>> contributors for the committers not reviewing/closing PRs fast
>>>>>> enough.
>>>>>> - Others felt that, rather than "sweeping the problem under the rug",
>>>>>> using the Stale PR functionality would actually reveal the true scale
>>>>>> of
>>>>>> the issue. Allowing committers to see what were truly active PRs.
>>>>>> - Other Apache projects such as Kafka, Beam, Spark, Airflow and many
>>>>>> others have enabled the stale PR GitHub action.
>>>>>> - Despite this, Kafka still has 1k open PRs. However, these PRs have
>>>>>> all
>>>>>> been updated/commented on in the last 3-4 months, so can be
>>>>>> considered
>>>>>> active.
>>>>>> - For Flink, only 12% of the open PRs have been updated in the last 3
>>>>>> months and only 41% in the last year.
>>>>>> - I propose we enable the Stale PR Github action to clear the backlog
>>>>>> and
>>>>>> reduce the PRs down to those that are active and relevant.
>>>>>> - We can start with PRs that haven't been active in the last year and
>>>>>> give
>>>>>> authors 3 months to refresh them. These thresholds could then be
>>>>>> reduced
>>>>>> over time, towards the norm for other Apache projects, of 3 months
>>>>>> inactivity and 1 month to refresh.
>>>>>> 
>>>>>> The Problem
>>>>>> 
>>>>>> Currently, we have 1245 open PRs in the main upstream Flink GitHub
>>>>>> repository. The oldest of which was created over seven and half years
>>>>>> ago.
>>>>>> Many of these PRs haven't been commented on or interacted with in
>>>>>> years.
>>>>>> 
>>>>>> I am definitely not here to cast blame. Flink is a huge project, the
>>>>>> committers are volunteers and only have so much time. Also, Flink is
>>>>>> certainly not the only open source project to face this issue.
>>>>>> However,
>>>>>> the
>>>>>> large number of open PRs is a drag on the community, it makes new
>>>>>> contributors think twice about opening PRs and I am sure it is
>>>>>> demoralising
>>>>>> for committers to see the mountain keep growing.
>>>>>> 
>>>>>> Dealing with this was a big part of why the [Community Health
>>>>>> Initiative
>>>>>> (CHI)](
>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/Community+Health+Initiative+(CHI)+workgroup
>>> 
>>>>> )
>>>>> 
>>>>>> working group was set up. We are [making progress](
>>> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332500010
>>> 
>>>>> )
>>>>> 
>>>>>> on reviewing and triaging the top of the PR stack. However, the
>>>>>> bottom
>>>>>> of
>>>>>> the stack is also an issue.
>>>>>> 
>>>>>> Background
>>>>>> 
>>>>>> It seems reasonable that a PR that is the better part of a decade old
>>>>>> and
>>>>>> hasn't been commented on in years, is probably not relevant and could
>>>>>> be
>>>>>> closed. Indeed, this very point has been brought up before, first in
>>>>>> 2018
>>>>>> where it
>>>>>> was
>>>>>> commented that:
>>>>>> 
>>>>>>> The current situation with 350 open PRs may send a signal to
>>>>>>> contributors that it may actually be too much hassle to get a change
>>>>>>> committed in Flink.
>>>>>> 
>>>>>> At that time, there was some push-back to
>>>>>> the
>>>>>> proposal of using the stale PR bot. Mostly around auto-closing the
>>>>>> PRs
>>>>>> being perceived as harsh, given that the issue was mostly due to lack
>>>>>> of
>>>>>> committer review. The Beam community went ahead and enabled it, but
>>>>>> the
>>>>>> discussion on the Flink side seems to have then died out.
>>>>>> 
>>>>>> The stale PR bot was raised again in 2019
>>>>>> and
>>>>>> had
>>>>>> a lot of support, including several examples of other Apache projects
>>>>>> using
>>>>>> it to good effect. However, this was again pushed back against as
>>>>>> hiding the symptoms of the underlying problem, namely committers not
>>>>>> engaging actively enough to close PRs that were no longer relevant or
>>>>>> had
>>>>>> no hope of being merged. The counter argument to
>>>>>> this
>>>>>> was that the PR closing bot was only one part of a solution, not the
>>>>>> whole
>>>>>> solution and that far from hiding the problem, the stale labelling
>>>>>> would
>>>>>> highlight the scale of the issue.
>>>>>> 
>>>>>> The Stale PR closing issue was raised further in 2022
>>>>>> and
>>>>>> 2023
>>>>>> with similar arguments. Including from CHI's own David Radley:
>>>>>> 
>>>>>>> We have over 1000 open prs. This is a lot of technical debt. I came
>>>>>>> across a 6 month old pr recently that had not been merged.
>>>>>>> A second Jira issue was raised for the same problem and a second pr
>>>>>>> fixed the issue (identically). The first pr was still on the backlog
>>>>>>> until
>>>>>>> we noticed it.
>>>>>> 
>>>>>> What other Apache projects are doing
>>>>>> 
>>>>>> The Stale PR/Issue GitHub action
>>>>>> is
>>>>>> used by many Apache projects including [Beam](
>>> 
>>> https://github.com/apache/beam/blob/master/.github/workflows/stale.yml
>>> 
>>>> ),
>>>> 
>>>>>> [Kafka](
>>> 
>>> https://github.com/apache/kafka/blob/trunk/.github/workflows/stale.yml
>>> 
>>>> ),
>>>> 
>>>>>> [Spark](
>>>> 
>>>> https://github.com/apache/spark/blob/master/.github/workflows/stale.yml)
>>>> 
>>>>>> and [Airflow](
>>>> 
>>>> https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml)
>>>> 
>>>>>> to name a few.
>>>>>> 
>>>>>> Apache Kafka uses a 90 day (3 months) limit to define a stale PR and
>>>>>> then
>>>>>> allows a further 30 days (1 month) for the author to refresh the PR
>>>>>> before
>>>>>> it is auto-closed. Even with the Stale PR action enabled, Kafka still
>>>>>> has
>>>>>> over 1000 open PRs. However, all of these PRs were updated, commented
>>>>>> on
>>>>>> or
>>>>>> otherwise interacted with in the last 3-4 months. This gives a much
>>>>>> better
>>>>>> overview of the scale of the open PR base.
>>>>>> 
>>>>>> For comparison, I did some basic analysis
>>>>>> of
>>>>>> Flink's open PRs. 55% were updated in the last 2 years, 41% in the
>>>>>> last
>>>>>> year, 12% in the last 3 months and only 8% in the last month. It is
>>>>>> reasonable to conclude that over half these PRs are probably not
>>>>>> relevant
>>>>>> anymore or need significant updates to be compatible.
>>>>>> 
>>>>>> What should we do?
>>>>>> 
>>>>>> My personal take on this is that, while I agree that the issue is
>>>>>> mostly
>>>>>> one of committer capacity to review these PRs, the upstream PR count
>>>>>> is
>>>>>> currently too high. It is discouraging engagement. But, it is also
>>>>>> not
>>>>>> fair
>>>>>> to blame committers for not wanting to spend time on PRs that are
>>>>>> years
>>>>>> out
>>>>>> of date and clearly not relevant anymore.
>>>>>> 
>>>>>> So I think we should declare PR bankruptcy and attempt to clear away
>>>>>> the
>>>>>> bulk of the old PRs. I don't use the word "bankruptcy" flippantly or
>>>>>> to
>>>>>> provoke, just to acknowledge that the scale of the issue has gotten
>>>>>> too
>>>>>> large to be dealt with through the hard work of committers alone.
>>>>>> Once
>>>>>> we
>>>>>> get the PR backlog to a manageable size, we can then focus on using
>>>>>> initiatives like CHI and other other workflow improvements to keep
>>>>>> the
>>>>>> PR
>>>>>> count low.
>>>>>> 
>>>>>> Proposal
>>>>>> 
>>>>>> Enable the stale PR GitHub action.
>>>>>> This action would:
>>>>>> 
>>>>>> - Identify any PR that has not been interacted with in the last `X`
>>>>>> months
>>>>>> as `Stale`:
>>>>>> 
>>>>>> - Apply a `Stale` label to the PR
>>>>>> - Comment on the PR that it is considered `Stale` and what to do to
>>>>>> refresh it and how to engage further with the community. This will
>>>>>> also
>>>>>> allow committers to easily get a list of stale PRs to review and
>>>>>> refresh/close.
>>>>>> - Identify any `Stale` PR that hasn't been refreshed (commented on or
>>>>>> otherwise updated) after a further `Y` months as closeable.
>>>>>> 
>>>>>> - Close the PR.
>>>>>> - Leave a closing comment highlighting that it can be reopened at any
>>>>>> point with pointers to how to engage the community.
>>>>>> 
>>>>>> The values of the stale (`X`) and close (`Y`) thresholds is up for
>>>>>> discussion. At least initially, given the shear number of old PRs, we
>>>>>> may
>>>>>> want to be more lenient. For example X = 1 year, Y = 3 months, would
>>>>>> limit
>>>>>> the initial number of stale PRs and allow committers more time to
>>>>>> review
>>>>>> the stale PR list. Once the PR list has been reduced sufficiently we
>>>>>> may
>>>>>> want to reduce these values in increments until, for example, X = 3
>>>>>> months
>>>>>> and Y = 1 month which seems to be the values other Apache projects
>>>>>> have
>>>>>> settled on.
>>>>>> 
>>>>>> Obviously, I am a relative newcomer to the community. I would really
>>>>>> like
>>>>>> to hear what others, especially committers, think of the above
>>>>>> proposal
>>>>>> and
>>>>>> hear any other ideas people have for taming the PR count.
>>>>>> 
>>>>>> Alternatives
>>>>>> 
>>>>>> Looking through the history of discussion on the subject, on several
>>>>>> occasions people have suggested doing more fine grained checks before
>>>>>> closing PRs, such as:
>>>>>> 
>>>>>>> closing up PRs after X days which:
>>>>>>> a) Don't have a CI that has passed
>>>>>>> b) Don't follow the code contribution guide (like commit naming
>>>>>>> conventions)
>>>>>>> c) Have changes requested but aren't being followed-up by the
>>>>>>> contributor
>>>>>> 
>>>>>> This is of course an option, but would probably require updating
>>>>>> FlinkBot.
>>>>>> There is no reason we couldn't enable both the Stale PR GitHub Action
>>>>>> and
>>>>>> update Flinkbot to enforce rules like those above.
>>>>>> 
>>>>>> Tom Cooper
>>>>>> @tomncooper | tomcooper.dev

Reply via email to