Huge +1 to this. I think X=6m and Y=3m makes sense, as something as simple as adding a comment refreshes it. As such, slightly longer than a normal Flink release cycle is a good line in the sand (as Matthias has said)
Side note : Just thinking of a typical PR review process, PRs either fall into Proposer do something, or Reviewer do something. Stale PR bot puts the onus on the Proposer to do something. I wonder if we should create a label like “Needs Reviewer” to help filter PRs that are currently waiting for Reviewer, so Reviewers can easily filter PRs where we do something! Hong > On 7 Jan 2025, at 23:21, Tom Cooper <c...@tomcooper.dev> wrote: > > > Hi Rui, > > Good point about recording the proposal. I am not sure it fits the FLIP > process so I made a page [1] under the CHI workgroup. > That page has a summary of the proposal and links to the discussion and vote > threads. > > Thanks, > > Tom Cooper > @tomncooper | tomcooper.dev > > [1] https://cwiki.apache.org/confluence/display/FLINK/Stale+PR+Cleanup > >> On Tuesday, 7 January 2025 at 06:13, Rui Fan <1996fan...@gmail.com> wrote: >> >> Thanks Tom for driving this discussion! >> >> It sounds make sense to me, could you help create a FLIP or doc >> to record it? It's helpful to let other developers check the stale strategy >> and background in the future, thank you. >> >> Best, >> Rui >> >>> On Wed, Dec 18, 2024 at 5:16 PM Matthias Pohl map...@apache.org wrote: >>> >>> +1 for adding the automation considering that other projects are doing this >>> as well and the high amount of open PRs doesn't help. Thanks for >>> summarizing the current state in such detail. >>> >>> X + Y should probably be larger than our usual release cycles. X=6m and >>> Y=3m might be alright in this regard. >>> >>> Matthias >>> >>>> On Tue, Dec 17, 2024 at 10:40 PM Gyula Fóra gyula.f...@gmail.com wrote: >>> >>>> Hey! >>>> >>>> Big +1 >>>> >>>> I think this is a great initiative and we should follow the path laid >>>> down >>>> by other projects. >>>> Starting with X=1 year and Y=3 months is very forgiving, I personally >>>> wouldn't mind immediately going more aggressive to clean up the backlog. >>>> >>>> So for me X = 6m Y = 3m is also perfectly fine. >>>> >>>> Cheers, >>>> Gyula >>>> >>>> On Tue, Dec 17, 2024 at 9:06 PM Jim Hughes <jhug...@confluent.io.invalid >>>> >>>> wrote: >>>> >>>>> Hi Tom, >>>>> >>>>> I like the idea of starting with X=1 year and Y=3 months. That plus >>>>> having >>>>> CHI review recent PRs would be helpful. >>>>> >>>>> Your suggestion of having a bot which "lints" naming conventions could >>>>> be >>>>> useful. Maybe there could be a CHI-bot to automate what you and others >>>>> are >>>>> doing manually. :) >>>>> I would not suggest that this bot close PRs, but rather it could >>>>> suggest >>>>> how to improve them. Or if CI is green and other conditions are met, >>>>> it >>>>> could apply a "ready for review" label to draw attention. >>>>> >>>>> Cheers, >>>>> >>>>> Jim >>>>> >>>>> On Tue, Dec 17, 2024 at 11:21 AM Tom Cooper c...@tomcooper.dev >>>>> wrote: >>>>> >>>>>> TL;DR >>>>>> >>>>>> - We have over 1.2k open PRs, this is an issue as it makes new >>>>>> contributors think twice about committing and looks like a problem >>>>>> that >>>>>> is >>>>>> too-big-to-solve for committers. >>>>>> - There have been various attempts, over the last 6 years, to enable >>>>>> the >>>>>> Stale PR bot/action to prompt authors to refresh old PRs and >>>>>> auto-close >>>>>> them if no action is taken. >>>>>> - These were rejected as some committers felt this was punishing >>>>>> contributors for the committers not reviewing/closing PRs fast >>>>>> enough. >>>>>> - Others felt that, rather than "sweeping the problem under the rug", >>>>>> using the Stale PR functionality would actually reveal the true scale >>>>>> of >>>>>> the issue. Allowing committers to see what were truly active PRs. >>>>>> - Other Apache projects such as Kafka, Beam, Spark, Airflow and many >>>>>> others have enabled the stale PR GitHub action. >>>>>> - Despite this, Kafka still has 1k open PRs. However, these PRs have >>>>>> all >>>>>> been updated/commented on in the last 3-4 months, so can be >>>>>> considered >>>>>> active. >>>>>> - For Flink, only 12% of the open PRs have been updated in the last 3 >>>>>> months and only 41% in the last year. >>>>>> - I propose we enable the Stale PR Github action to clear the backlog >>>>>> and >>>>>> reduce the PRs down to those that are active and relevant. >>>>>> - We can start with PRs that haven't been active in the last year and >>>>>> give >>>>>> authors 3 months to refresh them. These thresholds could then be >>>>>> reduced >>>>>> over time, towards the norm for other Apache projects, of 3 months >>>>>> inactivity and 1 month to refresh. >>>>>> >>>>>> The Problem >>>>>> >>>>>> Currently, we have 1245 open PRs in the main upstream Flink GitHub >>>>>> repository. The oldest of which was created over seven and half years >>>>>> ago. >>>>>> Many of these PRs haven't been commented on or interacted with in >>>>>> years. >>>>>> >>>>>> I am definitely not here to cast blame. Flink is a huge project, the >>>>>> committers are volunteers and only have so much time. Also, Flink is >>>>>> certainly not the only open source project to face this issue. >>>>>> However, >>>>>> the >>>>>> large number of open PRs is a drag on the community, it makes new >>>>>> contributors think twice about opening PRs and I am sure it is >>>>>> demoralising >>>>>> for committers to see the mountain keep growing. >>>>>> >>>>>> Dealing with this was a big part of why the [Community Health >>>>>> Initiative >>>>>> (CHI)]( >>> >>> https://cwiki.apache.org/confluence/display/FLINK/Community+Health+Initiative+(CHI)+workgroup >>> >>>>> ) >>>>> >>>>>> working group was set up. We are [making progress]( >>> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332500010 >>> >>>>> ) >>>>> >>>>>> on reviewing and triaging the top of the PR stack. However, the >>>>>> bottom >>>>>> of >>>>>> the stack is also an issue. >>>>>> >>>>>> Background >>>>>> >>>>>> It seems reasonable that a PR that is the better part of a decade old >>>>>> and >>>>>> hasn't been commented on in years, is probably not relevant and could >>>>>> be >>>>>> closed. Indeed, this very point has been brought up before, first in >>>>>> 2018 >>>>>> where it >>>>>> was >>>>>> commented that: >>>>>> >>>>>>> The current situation with 350 open PRs may send a signal to >>>>>>> contributors that it may actually be too much hassle to get a change >>>>>>> committed in Flink. >>>>>> >>>>>> At that time, there was some push-back to >>>>>> the >>>>>> proposal of using the stale PR bot. Mostly around auto-closing the >>>>>> PRs >>>>>> being perceived as harsh, given that the issue was mostly due to lack >>>>>> of >>>>>> committer review. The Beam community went ahead and enabled it, but >>>>>> the >>>>>> discussion on the Flink side seems to have then died out. >>>>>> >>>>>> The stale PR bot was raised again in 2019 >>>>>> and >>>>>> had >>>>>> a lot of support, including several examples of other Apache projects >>>>>> using >>>>>> it to good effect. However, this was again pushed back against as >>>>>> hiding the symptoms of the underlying problem, namely committers not >>>>>> engaging actively enough to close PRs that were no longer relevant or >>>>>> had >>>>>> no hope of being merged. The counter argument to >>>>>> this >>>>>> was that the PR closing bot was only one part of a solution, not the >>>>>> whole >>>>>> solution and that far from hiding the problem, the stale labelling >>>>>> would >>>>>> highlight the scale of the issue. >>>>>> >>>>>> The Stale PR closing issue was raised further in 2022 >>>>>> and >>>>>> 2023 >>>>>> with similar arguments. Including from CHI's own David Radley: >>>>>> >>>>>>> We have over 1000 open prs. This is a lot of technical debt. I came >>>>>>> across a 6 month old pr recently that had not been merged. >>>>>>> A second Jira issue was raised for the same problem and a second pr >>>>>>> fixed the issue (identically). The first pr was still on the backlog >>>>>>> until >>>>>>> we noticed it. >>>>>> >>>>>> What other Apache projects are doing >>>>>> >>>>>> The Stale PR/Issue GitHub action >>>>>> is >>>>>> used by many Apache projects including [Beam]( >>> >>> https://github.com/apache/beam/blob/master/.github/workflows/stale.yml >>> >>>> ), >>>> >>>>>> [Kafka]( >>> >>> https://github.com/apache/kafka/blob/trunk/.github/workflows/stale.yml >>> >>>> ), >>>> >>>>>> [Spark]( >>>> >>>> https://github.com/apache/spark/blob/master/.github/workflows/stale.yml) >>>> >>>>>> and [Airflow]( >>>> >>>> https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml) >>>> >>>>>> to name a few. >>>>>> >>>>>> Apache Kafka uses a 90 day (3 months) limit to define a stale PR and >>>>>> then >>>>>> allows a further 30 days (1 month) for the author to refresh the PR >>>>>> before >>>>>> it is auto-closed. Even with the Stale PR action enabled, Kafka still >>>>>> has >>>>>> over 1000 open PRs. However, all of these PRs were updated, commented >>>>>> on >>>>>> or >>>>>> otherwise interacted with in the last 3-4 months. This gives a much >>>>>> better >>>>>> overview of the scale of the open PR base. >>>>>> >>>>>> For comparison, I did some basic analysis >>>>>> of >>>>>> Flink's open PRs. 55% were updated in the last 2 years, 41% in the >>>>>> last >>>>>> year, 12% in the last 3 months and only 8% in the last month. It is >>>>>> reasonable to conclude that over half these PRs are probably not >>>>>> relevant >>>>>> anymore or need significant updates to be compatible. >>>>>> >>>>>> What should we do? >>>>>> >>>>>> My personal take on this is that, while I agree that the issue is >>>>>> mostly >>>>>> one of committer capacity to review these PRs, the upstream PR count >>>>>> is >>>>>> currently too high. It is discouraging engagement. But, it is also >>>>>> not >>>>>> fair >>>>>> to blame committers for not wanting to spend time on PRs that are >>>>>> years >>>>>> out >>>>>> of date and clearly not relevant anymore. >>>>>> >>>>>> So I think we should declare PR bankruptcy and attempt to clear away >>>>>> the >>>>>> bulk of the old PRs. I don't use the word "bankruptcy" flippantly or >>>>>> to >>>>>> provoke, just to acknowledge that the scale of the issue has gotten >>>>>> too >>>>>> large to be dealt with through the hard work of committers alone. >>>>>> Once >>>>>> we >>>>>> get the PR backlog to a manageable size, we can then focus on using >>>>>> initiatives like CHI and other other workflow improvements to keep >>>>>> the >>>>>> PR >>>>>> count low. >>>>>> >>>>>> Proposal >>>>>> >>>>>> Enable the stale PR GitHub action. >>>>>> This action would: >>>>>> >>>>>> - Identify any PR that has not been interacted with in the last `X` >>>>>> months >>>>>> as `Stale`: >>>>>> >>>>>> - Apply a `Stale` label to the PR >>>>>> - Comment on the PR that it is considered `Stale` and what to do to >>>>>> refresh it and how to engage further with the community. This will >>>>>> also >>>>>> allow committers to easily get a list of stale PRs to review and >>>>>> refresh/close. >>>>>> - Identify any `Stale` PR that hasn't been refreshed (commented on or >>>>>> otherwise updated) after a further `Y` months as closeable. >>>>>> >>>>>> - Close the PR. >>>>>> - Leave a closing comment highlighting that it can be reopened at any >>>>>> point with pointers to how to engage the community. >>>>>> >>>>>> The values of the stale (`X`) and close (`Y`) thresholds is up for >>>>>> discussion. At least initially, given the shear number of old PRs, we >>>>>> may >>>>>> want to be more lenient. For example X = 1 year, Y = 3 months, would >>>>>> limit >>>>>> the initial number of stale PRs and allow committers more time to >>>>>> review >>>>>> the stale PR list. Once the PR list has been reduced sufficiently we >>>>>> may >>>>>> want to reduce these values in increments until, for example, X = 3 >>>>>> months >>>>>> and Y = 1 month which seems to be the values other Apache projects >>>>>> have >>>>>> settled on. >>>>>> >>>>>> Obviously, I am a relative newcomer to the community. I would really >>>>>> like >>>>>> to hear what others, especially committers, think of the above >>>>>> proposal >>>>>> and >>>>>> hear any other ideas people have for taming the PR count. >>>>>> >>>>>> Alternatives >>>>>> >>>>>> Looking through the history of discussion on the subject, on several >>>>>> occasions people have suggested doing more fine grained checks before >>>>>> closing PRs, such as: >>>>>> >>>>>>> closing up PRs after X days which: >>>>>>> a) Don't have a CI that has passed >>>>>>> b) Don't follow the code contribution guide (like commit naming >>>>>>> conventions) >>>>>>> c) Have changes requested but aren't being followed-up by the >>>>>>> contributor >>>>>> >>>>>> This is of course an option, but would probably require updating >>>>>> FlinkBot. >>>>>> There is no reason we couldn't enable both the Stale PR GitHub Action >>>>>> and >>>>>> update Flinkbot to enforce rules like those above. >>>>>> >>>>>> Tom Cooper >>>>>> @tomncooper | tomcooper.dev