Hi everyone, I'm overall +1 on Ryan's comment. When we're talking about component ownership, I've started a discussion on the Infra mailing list in the beginning of the year on it. In principle, the "codeowners" idea goes against ASF principles.
Let's summarize things: 1. From a project perspective, we can have a discussion about closing PRs automatically that a) are not followed-up within X number of days after a review and/or b) PRs that don't have a passing build and/or don't follow contribution guidelines and/or C) need to be rebased 2. In order to help understand which PRs are OK to get reviewed, we could consider automatically adding a label "Ready for review" in case 1b (passing build/contribution guidelines met) is the case. 3. In order to help contributors, we could consider automatically adding a label in case their PR isn't mergeable for the situations that are displayed in situation 1 When that's done, we can see what the effect is on the PRs queue. Best regards, Martijn On Wed, Oct 4, 2023 at 5:13 PM David Radley <david_rad...@uk.ibm.com> wrote: > > Hi Ryan, > > I agree that good communication is key to determining what can be worked on. > > In terms of metrics , we can use the gh cli to list prs and we can export > issues from Jira. A view across them, you could join on the Flink issue (at > the start of the pr comment and the flink issue itself – you could then see > which prs have an assigned Jira would be expected to be reviewed. There is no > explicit reviewer field in the Jira issue; I am not sure if we can easily get > this info without having a custom field (which others have tried). > > In terms of what prs a committer could / should review – I would think that > component ownership helps scope the subset of prs to review / merge. > > Kind regards, David. > > > From: Ryan Skraba <ryan.skr...@aiven.io.INVALID> > Date: Wednesday, 4 October 2023 at 15:09 > To: dev@flink.apache.org <dev@flink.apache.org> > Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs > Hey, this has been an interesting discussion -- this is something that > has been on my mind as an open source contributor and committer (I'm > not a Flink committer). > > A large number of open PRs doesn't _necessarily_ mean a project is > unhealthy or has technical debt. If it's fun and easy to get your > contribution accepted and committed, even for a small fix, you're more > likely to raise another PR, and another. I wouldn't be surprised if > there's a natural equilibrium where adding capacity to smoothly review > and manage more PRs cause more PRs to be submitted. Everyone wins! > > I don't think there's a measure for the "average PR lifetime", or > "time to first comment", but those would be more interesting things to > know and those are the worrisome ones. > > As a contributor, I'm pretty willing to wait as long as necessary (and > rebase and fix merge conflicts) if there's good communication in > place. I'm pretty patient, especially if I knew that the PR would be > looked at and merged for a specific fix version (for example). I'd > expect simple and obvious fixes with limited scope to take less time > than a more complex, far-reaching change. I'd probably appreciate > that the boring-cyborg welcomes me on my first PR, but I'd be pretty > irritated if any PR were closed without any human interaction. > > As a reviewer or committer, it's just overwhelming to see the big > GitHub list, and sometimes it feels random just "picking one near the > top" to look at. In projects where I have the committer role, I > sometimes feel more badly about work I'm *not* doing than the work I'm > getting done! This isn't sustainable either. A lot of people on the > project are volunteering after hours, and grooming, reviewing and > commenting PRs shouldn't be a thankless, unending job to feel bad > about. > > As a contributor, one "magic" solution that I'd love to see is a > better UI that could show (for example) tentative "review dates", like > the number at a butcher shop, and proposed reviewers. > > If I was committing to reviewing a PR every day, it would be great if > I could know which ones were the best "next" candidates to review: the > one waiting longest, or a new, critical fix in my domain. As it > stands, there's next to no chance that the PRs in the middle of the > list are going to get any attention, but closing them stand to lose > valuable work or (worse) turn off a potential contributor forever. > > Taking a look at some open PRs that I authored or interacted with: I > found one that should have been closed, one that was waiting for MY > attention for a merge-squash-rebase (oops), another where I made some > requested changes and it's back in review limbo. Unfortunately, I > don't think any of these would have been brought to my attention by a > nag-bot. I don't think I'm alone; automated emails get far less > attention with sometime not giving automated emails much attention. > > OK, one more thing to think about: some underrepresented groups in > tech can find it difficult to demand attention, through constant > pinging and commenting and reminding... So finding ways to make sure > that non-squeaky wheels also get some love is a really fair goal. > > There's some pretty good ideas in this conversation, and I'm really > glad to hear it being brought up! I'd love to hear any other > brainstorming for ideas, and get the virtual circle that David > mentioned! > > All my best, Ryan > > > > > > > > On Wed, Oct 4, 2023 at 12:03 PM David Radley <david_rad...@uk.ibm.com> wrote: > > > > Hi, > > To add I agree with Martijn’s insights; I think we are saying similar > > things. To progress agreed upon work, and not blanket close all stale prs, > > Kind regards, David. > > > > From: David Radley <david_rad...@uk.ibm.com> > > Date: Wednesday, 4 October 2023 at 10:59 > > To: dev@flink.apache.org <dev@flink.apache.org> > > Subject: [EXTERNAL] RE: Close orphaned/stale PRs > > Hi , > > I agree Venkata this issue is bigger than closing out stale prs. > > > > We can see that issues are being raised at a rate way above the resolution > > time. > > https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12315522&periodName=daily&daysprevious=90&cumulative=true&versionLabels=major&selectedProjectId=12315522&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Acreatedvsresolved-report&atl_token=A5KQ-2QAV-T4JA-FDED_19ff17decb93662bafa09e4b3ffb3a385c202015_lin&Next=Next > > Gaining over 500 issues to the backlog every 3 months. > > > > We have over 1000 open prs. This is a lot of technical debt. I came across > > a 6 month old pr recently that had not been merged. A second Jira issue was > > raised for the same problem and a second pr fixed the issue (identically). > > The first pr was still on the backlog until we noticed it. > > > > I am looking to contribute to the community to be able to identify issues I > > can work on and then be reasonably certain they will be reviewed and merged > > so I can build on contributions. I have worked as a maintainer and > > committer in other communities and managed to spend some of the week > > addressing incoming work; I am happy to do this in some capacity with the > > support of committer(s) for Flink. It seems to me it is virtuous circle to > > enable more contributions, to get more committers , builds those committers > > that can help merge and review the backlog. > > > > Some thoughts ( I am new to this – so apologise if I have misunderstood > > something or am unaware of other existing mechanisms) : > > > > 1. If there is an issue that a committer has assigned to a contributor > > as per the > > process<https://flink.apache.org/how-to-contribute/contribute-code/ > , > > and there is a pr then it should be with the committer to review the pr, or > > return it to the work queue. I do not know how many prs are like this. It > > seems to me that if a committer assigns an issue, they are indicating they > > will review, unassign themselves or merge. I do not think these prs should > > be closed as stale. > > 2. Could we have a Git action to notify committers (tagged in the pr?) > > if a pr (that has an assigned Jira) has not been reviewed in a certain > > period (7 days?) then subsequent nags if there has been no response . In > > this way busy committers can see that a pr needs looking at. > > 3. Other prs have been raised without a committer saying that they will > > fix it. In this case there is likely to be value, but the merging and > > review work has not been taken on by anyone. I notice spelling mistake prs > > that have not been merged (there are 8 with this query > > https://github.com/apache/flink/pulls?q=is%3Apr+is%3Aopen+spelling ) , > > these are typical newbee prs as they are simple but useful improvements.; > > it would be great if these simpler ones could just be merged – maybe they > > should be marked as a [hotfix] to indicate they should be merged. If > > simpler prs are not merged – it is very difficult for new contributors to > > gain eminence to get towards being a committer. > > 4. There are also issues that have been raised by people who do not want > > to fix them. It seems to me that we need a “triaged” state to indicate the > > issue looks valid and reasonable, so could be picked up by someone – at > > which time they would need to agree with a committer to get the associated > > pr reviewed and merged. This triaged state would be a pool of issues that > > new contributors to choose from > > > > > > > > I am happy to help to improve – once we have consensus, > > > > > > > > Kind regards, David. > > > > > > > > > > From: Venkatakrishnan Sowrirajan <vsowr...@asu.edu> > > Date: Wednesday, 4 October 2023 at 00:36 > > To: dev@flink.apache.org <dev@flink.apache.org> > > Subject: [EXTERNAL] Re: Close orphaned/stale PRs > > Gentle ping to surface this up for more discussions. > > > > Regards > > Venkata krishnan > > > > > > On Tue, Sep 26, 2023 at 4:59 PM Venkatakrishnan Sowrirajan > > <vsowr...@asu.edu> > > wrote: > > > > > Hi Martijn, > > > > > > Agree with your point that closing a PR without any review feedback even > > > after 'X' days is discouraging to a new contributor. I understand that > > > this > > > is a capacity problem. Capacity problem cannot be solved by this proposal > > > and it is beyond the scope of this proposal. > > > > > > Regarding your earlier question, > > > > What's the added value of > > > closing these PRs > > > > > > - Having lots of inactive PRs lingering around shows the project is > > > less active. I am not saying this is the only way to determine how > > > active a > > > project is, but this is one of the key factors. > > > - A large number of PRs open can be discouraging for (new) > > > contributors but on the other hand I agree closing an inactive PR > > > without > > > any reviews can also drive contributors away. > > > > > > Having said all of that, I agree closing PRs that don't have any reviews > > > to start with should be avoided from the final proposal. > > > > > > > I'm +1 for (automatically) closing up PRs after X days which: > > > a) Don't have a CI that has passed > > > b) Don't follow the code contribution guide (like commit naming > > > conventions) > > > c) Have changes requested but aren't being followed-up by the contributor > > > > > > In general, I'm largely +1 on your above proposal except for the > > > implementation feasibility. > > > > > > Also, I have picked a few other popular projects that have implemented the > > > Github's actions stale rule to see if we can borrow some ideas. Below > > > projects are listed in the order of the most invasive (for lack of a > > > better > > > word) to the least invasive actions taken wrt PR without any updates for a > > > long period of time. > > > > > > 1. Trino > > > > > > TL;DR - No updates in the PR for the last 21 days, tag other maintainers > > > for review. If there are no updates for 21 days after that, close the PR > > > with this message - "*Closing this pull request, as it has been stale for > > > six weeks. Feel free to re-open at any time.*" > > > Trino's stale PR Github action rule (stale.yaml) > > > <https://github.com/trinodb/trino/blob/master/.github/workflows/stale.yml > > > > > > > > > > > > > 2. Apache Spark > > > > > > TL;DR - No updates in the PR in the last 100 days, closing the PR with > > > this message - "*We're closing this PR because it hasn't been updated in > > > a while. This isn't a judgement on the merit of the PR in any way. It's > > > just a way of keeping the PR queue manageable. If you'd like to revive > > > this > > > PR, please reopen it and ask a committer to remove the Stale tag!*" > > > Spark's discussion in their mailing list > > > <https://lists.apache.org/thread/yg3ggtvpt2dbjpnb2q0yblq30sc1g2yx > on > > > closing stale PRs. Spark's stale PR github action rule (stale.yaml > > > <https://github.com/apache/spark/blob/master/.github/workflows/stale.yml > > > > > > > ). > > > > > > 3. Python > > > > > > TL;DR - No updates in the PR for the last 30 days, then tag the PR as > > > stale. Note: Python project *doesn't* close the stale PRs. > > > > > > Python discussion > > > <https://discuss.python.org/t/decision-needed-should-we-close-stale-prs-and-how-many-lapsed-days-are-prs-considered-stale/4637 > > > > > > > in the mailing list to close stale PRs. Python's stale PR github action > > > rule (stale.yaml > > > <https://github.com/python/cpython/blob/main/.github/workflows/stale.yml > > > >) > > > > > > Few others Apache Beam > > > <https://github.com/apache/beam/blob/master/.github/workflows/stale.yml > > > > (closes > > > inactive PRs after 60+ days), Apache Airflow > > > <https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml > > > > (closes > > > inactive PRs after 50 days) > > > > > > Let me know what you think. Looking forward to hearing from others in the > > > community and their experiences. > > > > > > [1] Github Action - Close Stale Issues - > > > https://github.com/marketplace/actions/close-stale-issues > > > > > > Regards > > > Venkata krishnan > > > > > > > > > On Thu, Sep 21, 2023 at 6:03 AM Martijn Visser <martijnvis...@apache.org> > > > wrote: > > > > > >> Hi all, > > >> > > >> I really believe that the problem of the number of open PRs is just > > >> that there aren't enough reviewers/resources available to review them. > > >> > > >> > Stale PRs can clutter the repository, and closing them helps keep it > > >> organized and ensures that only relevant and up-to-date PRs are present. > > >> > > >> Sure, but what's the indicator that the PR is stale? The fact that > > >> there has been no reviewer yet to review it, doesn't mean that the PR > > >> is stale. For me, a stale PR is a PR that has been reviewed, changes > > >> have been requested and the contributor isn't participating in the > > >> discussion anymore. But that's a different story compared to closing > > >> PRs where there has been no review done at all. > > >> > > >> > It mainly helps the project maintainers/reviewers to focus on only the > > >> actively updated trimmed list of PRs that are ready for review. > > >> > > >> I disagree that closing PRs helps with this. If you want to help > > >> maintainers/reviewers, we should have a situation where it's obvious > > >> that a PR is really ready (meaning, CI has passed, PR contents/commit > > >> message etc are following the code contribution guidelines). > > >> > > >> > It helps Flink users who are waiting on a PR that enhances an existing > > >> feature or fixes an issue a clear indication on whether the PR will be > > >> continually worked on and eventually get a closure or not and therefore > > >> will be closed. > > >> > > >> Having other PRs being closed doesn't increase the guarantee that > > >> other PRs will be reviewed. It's still a capacity problem. > > >> > > >> > It would be demotivating for any contributor when there is no feedback > > >> for a PR within a sufficient period of time anyway. > > >> > > >> Definitely. But I think it would be even worse if someone makes a > > >> contribution, there is no response but after X days they get a message > > >> that their PR was closed automatically. > > >> > > >> I'm +1 for (automatically) closing up PRs after X days which: > > >> a) Don't have a CI that has passed > > >> b) Don't follow the code contribution guide (like commit naming > > >> conventions) > > >> c) Have changes requested but aren't being followed-up by the contributor > > >> > > >> I'm -1 for automatically closing PRs where no maintainers have taken a > > >> review for the reasons I've listed above. > > >> > > >> Best regards, > > >> > > >> Martijn > > >> > > >> On Wed, Sep 20, 2023 at 7:41 AM Venkatakrishnan Sowrirajan > > >> <vsowr...@asu.edu> wrote: > > >> > > > >> > Thanks for your response, Martijn. > > >> > > > >> > > What's the added value of > > >> > closing these PRs > > >> > > > >> > It mainly helps the project maintainers/reviewers to focus on only the > > >> > actively updated trimmed list of PRs that are ready for review. > > >> > > > >> > It helps Flink users who are waiting on a PR that enhances an existing > > >> > feature or fixes an issue a clear indication on whether the PR will be > > >> > continually worked on and eventually get a closure or not and therefore > > >> > will be closed. > > >> > > > >> > Btw, I am open to other suggestions or enhancements on top of the > > >> proposal > > >> > as well. > > >> > > > >> > > it would > > >> > just close PRs where maintainers haven't been able to perform a > > >> > review, but getting a PR closed without any feedback is also > > >> > demotivating for a (potential new) contributor > > >> > > > >> > It would be demotivating for any contributor when there is no feedback > > >> for > > >> > a PR within a sufficient period of time anyway. I don't see closing the > > >> PR > > >> > which is inactive after a sufficient period of time (say 60 to 90 days) > > >> > would be any more discouraging than not getting any feedback. The > > >> problem > > >> > of not getting feedback due to not enough maintainer's bandwidth has to > > >> be > > >> > solved through other mechanisms. > > >> > > > >> > > I think the important > > >> > thing is that we get into a cycle where maintainers can see which PRs > > >> > are ready for review, and also a way to divide the bulk of the work. > > >> > > > >> > Yes, exactly my point as well. It helps the maintainers to see a > > >> > trimmed > > >> > list which is ready to be reviewed. > > >> > > > >> > +1 for the other automation to nudge/help the contributor to fix the PR > > >> > that follows the contribution guide, CI checks passed etc. > > >> > > > >> > > IIRC we can't really fix that until we can > > >> > finally move to dedicated Github Action Runners instead of the current > > >> > setup with Azure, but that's primarily blocked by ASF Infra. > > >> > > > >> > Curious, if you can share the JIRA or prior discussion on this topic. I > > >> > would like to learn more about why Github Actions cannot be used for > > >> Apache > > >> > Flink. > > >> > > > >> > Regards > > >> > Venkata krishnan > > >> > > > >> > > > >> > On Tue, Sep 19, 2023 at 2:00 PM Martijn Visser < > > >> martijnvis...@apache.org> > > >> > wrote: > > >> > > > >> > > Hi Venkata, > > >> > > > > >> > > Thanks for opening the discussion, I've been thinking about it quite > > >> > > a > > >> > > bit but I'm not sure what's the right approach. > > >> > > > > >> > > From your proposal, the question would be "What's the added value of > > >> > > closing these PRs"? I don't see an immediate value of that: it would > > >> > > just close PRs where maintainers haven't been able to perform a > > >> > > review, but getting a PR closed without any feedback is also > > >> > > demotivating for a (potential new) contributor. I think the important > > >> > > thing is that we get into a cycle where maintainers can see which PRs > > >> > > are ready for review, and also a way to divide the bulk of the work. > > >> > > Because doing proper reviews requires time, and these resources are > > >> > > scarce. > > >> > > > > >> > > I do think that we can make lives a bit easier with some automation: > > >> > > * There are a lot of PRs which don't follow the contribution guide > > >> > > (No > > >> > > Jira ticket, no correct commit message etc). For the externalized > > >> > > connector repositories, we've been trying Boring Cyborg to provide > > >> > > information back to contributors if their PRs are as expected. If the > > >> > > PR doesn't follow the contribution guide, I'm included to give such a > > >> > > PR less attention review. That's primarily because there are other > > >> > > PRs > > >> > > out there that do follow these guides. > > >> > > * There are even more PRs where the CI has failed: in those cases, a > > >> > > review also makes less sense, given that the PR can't be merged as > > >> > > is. > > >> > > I do see that contributors sometimes don't know where to look for the > > >> > > status of the CI, but IIRC we can't really fix that until we can > > >> > > finally move to dedicated Github Action Runners instead of the > > >> > > current > > >> > > setup with Azure, but that's primarily blocked by ASF Infra. > > >> > > > > >> > > I'm curious what others in the community think. > > >> > > > > >> > > Best regards, > > >> > > > > >> > > Martijn > > >> > > > > >> > > On Tue, Sep 19, 2023 at 10:33 PM Venkatakrishnan Sowrirajan > > >> > > <vsowr...@asu.edu> wrote: > > >> > > > > > >> > > > Hi Flink devs, > > >> > > > > > >> > > > There are currently over 1,000 open pull requests > > >> > > > < > > >> > > > > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc > > >> <https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc > > >> > > > >> > > > > > >> > > > (PRs) in the Apache Flink repository, with only 162 having been > > >> updated > > >> > > in > > >> > > > the last two months > > >> > > > < > > >> > > > > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated > > >> > > >> :>2023-07-19<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated > > >> :>2023-07-19> > > >> > > >. > > >> > > > This means that more than 85% of the PRs are stale and have not > > >> > > > been > > >> > > > touched. > > >> > > > > > >> > > > I suggest setting up Github actions to monitor these stale PRs, and > > >> > > > automatically closing them if they have not been updated in the > > >> last 'x' > > >> > > > days. Authors would still be able to reopen the closed PRs if they > > >> > > continue > > >> > > > with their work. This would help to keep the PR queue manageable. > > >> > > > > > >> > > > Not sure if this has been discussed in the Apache Flink community > > >> before. > > >> > > > > > >> > > > Thoughts? > > >> > > > > > >> > > > Regards > > >> > > > Venkata krishnan > > >> > > > > >> > > > > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU