Cool. So it seems like we have an agreement on the concept. lets review the details/specific concerns in the PR https://github.com/apache/airflow/pull/29554
On Mon, Feb 13, 2023 at 12:27 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Fine for me to start this way :) > > On Mon, Feb 13, 2023 at 10:56 AM Elad Kalif <elad...@gmail.com> wrote: > > > > 1) The committer/PMC/Triage member will remove the needs-triage label. > This is not really an additional step. > > We are already relabeling when we triage an issue. The removal of the > label doesn't have to happen on the first touchdown. > > Sometimes the triager doesn't have the full knowledge so tagging another > member of the community or needs to ask followup questions. > > In my perspective triage is done once issue is understood, reproducible > and just waiting for someone to pick it up (usually this is also the stage > where we add the good first issue / area labels) > > > > 2) Procedures take time to be fully adopted. From past experience > eventually everyone is aligned with new policies. > > Even if we get it wrong in specific places it's very easy to correct it. > Dashboard can be really nice. > > > > 3) There is not much we can do. The next step after triage is to open > PR. This depends on someone who will pick up the issue. > > We can measure time since creation/last action but also break by > reported version. > > > > > > On Mon, Feb 13, 2023 at 11:38 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> > >> Yes. I agree it is a good first step. Let's just not stop on that. > >> Once we have it, I think starting measuring "responsiveness" is > >> crucial. > >> > >> Also - even if it is the first, step, it has to be well defined. > >> Adding such labels should be accompanied with some way of explaining > >> and educating those who would use it how to deal with it. Because > >> setting it is one thing and important is what happens next. Few > >> questions: > >> > >> 1) Who and when should I remove it ? I believe it adds extra > >> responsibility on those who look at the issue and respond to the user, > >> to remove it when the issue has been "triaged" already - is that the > >> idea - to do it always as an extra manual step when we respond to such > >> issue (sounds like extra small but regular burden). Maybe the bot > >> could automatically remove the label when a maintainer responded to > >> the issue? We could do this later, but I am curious what you think > >> there. > >> > >> 2) Do you think we should have some "dashboard" showing the issues to > >> be triaged? Or you think just "label needs-triage" will be enough and > >> every one of the maintainers will know and should simply look at those > >> issues that "needs triage"? > >> > >> 3) what should we do about issues that have been triaged and the user > >> "responded" (and no-one will follow up - this happens). Are we going > >> to track them too or is it something to tackle next ? > >> > >> J. > >> > >> > >> > >> On Mon, Feb 13, 2023 at 10:02 AM Elad Kalif <elad...@apache.org> wrote: > >> > > >> > > Setting the label does not mean that someone will have eyes on it. > >> > > >> > True. but that is just about creating a work queue so when someone > does spend time on triage the issues can be found easily. > >> > This will also address your other points of needing data. By having > the label can measure several metrics regarding waiting for triage time > (script that checks open issues with the labels (daily?) and possibly push > notification to the issue-triage channel in slack or some other channel?) > >> > > >> > There are many further improvements we can do. For example setup > https://github.com/google/triage-party tool > >> > > >> > On Sun, Feb 12, 2023 at 10:29 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > >> >> > >> >> Maybe I am not getting the full extent of the proposal and maybe I > >> >> "hijacked" it a bit, but my comment was really related to (4) - new > >> >> issues. My mistake. Let me comment on those. > >> >> > >> >> 1) 2) 3) -> This is good as a cleanup. We can do this as a manually > >> >> run bulk process to add such comments now for all such old issues and > >> >> run them periodically (every few months) if needed. That will be > >> >> simpler. I am fine with that. We can even initially run it manually > >> >> and if we find it useful we can turn it into a bot. But I think we > >> >> should not have to do it again and this is what I mostly commented > on. > >> >> > >> >> > In item (4) I suggested adding a needs-triage label to make sure > that any issue we get will have at least 1 committer/PMC/Triage member eyes > on it. > >> >> > >> >> Setting the label does not mean that someone will have eyes on it. > >> >> It's a good starting point though. I think measuring and > incentivising > >> >> responsiveness for new issues is a key. And if we make sure that we > >> >> respond to issues in a timely manner, the current stale-bot is > enough. > >> >> It will be closing only issues/PRs which are "pending response". All > >> >> the other issues will be either acknowledged by a maintainer and at > >> >> least vaguely planned to work on, or converted to a discussion, or > >> >> fixed or marked as "good first issue" for anyone to pick up. > >> >> > >> >> J. > >> >> > >> >> > >> >> > >> >> On Sun, Feb 12, 2023 at 8:37 PM Elad Kalif <elad...@apache.org> > wrote: > >> >> > > >> >> > I'm not sure if the scenario you are worried about can happen? > >> >> > > >> >> > In item (4) I suggested adding a needs-triage label to make sure > that any issue we get will have at least 1 committer/PMC/Triage member eyes > on it. > >> >> > In this step the issue can be accepted and then this label is > replaced by (reported_version) or rejected and be closed/converted to > discussion. > >> >> > If a year passed and no one did anything with the issue the > automation will simply ask the user to let us know if the issue is still > happening on newer Airflow version. > >> >> > The issue may have already been solved and we just didn't notice. > Assuming the user won't comment in a defined time frame then it will close > the issue (if someone in the future will say we did wrong we can always > reopen) > >> >> > This is basically what happens today just by manually process. > >> >> > > >> >> > - What happens if the user replies that it's reproducible? > >> >> > We will replace the previous reported_version with a new one (for > example: reported_version:2.0 -> reported_version:2.5) this will bump the > issue to the latest bug lists. > >> >> > > >> >> > > >> >> > On Sun, Feb 12, 2023 at 5:48 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > >> >> >> > >> >> >> TL;DR; I think we should first solve the issue of improving our > >> >> >> "responsiveness" as committers first. I believe once we solve it, > the > >> >> >> stale bot closing issue will be a useful and non-offensive tool. > >> >> >> > >> >> >> (Sorry for a loong email, but I have been thinking a lot about > it and > >> >> >> I had many observations that came from responding and triaging our > >> >> >> issues and PRs and discussions - this took likely 30%/40% of my > time > >> >> >> over the last few months). > >> >> >> > >> >> >> Yes. I also have some serious doubts about closing issues > "blindly" by > >> >> >> one criteria only by time of inactivity. I think this is just > wrong > >> >> >> and we should not do it. > >> >> >> > >> >> >> I agree with Ash that this is really infuriating, when I opened an > >> >> >> issue, then 3 months have passed and it has been closed due to > >> >> >> inactivity. This is simply offensive - no matter if it's a bug or > >> >> >> issue or PR. But I think this is not a problem that we have to > deal > >> >> >> better with the issues as maintainers, not that the stale bot is a > >> >> >> problem. > >> >> >> > >> >> >> But this is only one side of the story - If you have a stale issue > >> >> >> first and closed, when the issue is "pending response" from the > user - > >> >> >> I have absolutely no problem with that. If the user who opened it > is > >> >> >> asked for extra information and has not found a time to provide > it - > >> >> >> there is absolutely no reason it should take the mental space of > the > >> >> >> maintainers and we should close it automatically. We can always > >> >> >> re-open it if the user comes back with more information. > >> >> >> > >> >> >> > >> >> >> But coming back to closing issues without reaction from anyone. If > >> >> >> this kind of closing happens that we should be ashamed of that > means > >> >> >> something else. That means that we as maintainers have done a bad > job > >> >> >> in triaging this issue. This is really an indication that no-one - > >> >> >> neither regular contributors (which happens) nor maintainers > (which > >> >> >> should look at it if no contributor does) found a time to read, > >> >> >> analyse and respond to an issue. No matter what response it will > be. > >> >> >> ANY response from a maintainer (won'tfix, asking for more > information, > >> >> >> asking others to provide more community to provide more evidence > if > >> >> >> the issue is impossible to diagnose, convert to a discussion) is > >> >> >> better than silence. Way, way better. > >> >> >> > >> >> >> From my point of view, I think the real problem we have is that we > >> >> >> often have issues open for weeks or monhts without ANY > interaction - > >> >> >> or there is no interaction after the user provided some kind of > >> >> >> response, additional information etc. Every now and then I do a > >> >> >> "streak" where I try to provide A response to EVERY issue and PRs > >> >> >> opened and not responded to for the last few weeks. And there are > a > >> >> >> number of those for issues or PRs that are even 3-4 weeks without > any > >> >> >> answer. > >> >> >> > >> >> >> And I am as guilty as everyone else here, but I have a feeling > that if > >> >> >> we collectively as maintainers spend quite a good chunk of our > time > >> >> >> triaging and responding to issues in due time. I think if we end > up > >> >> >> with a situation where a user raises an issue or PR or provides a > >> >> >> feedback/new iteration etc. and there is absolutely no response > for > >> >> >> more than a week - this is an indication we have a huge problem. > And > >> >> >> the worst types of thoe are where someone "requests changes", > those > >> >> >> changes are applied and the user pings the reviewer and there are > >> >> >> weeks of no response (even to multiple pings). Those happen > rarely in > >> >> >> our and I think they are a bit even disrespectful to the users > who had > >> >> >> "done their part". > >> >> >> > >> >> >> And I believe (I have no stats, just gut feeling) that we have > that to > >> >> >> some extent - for features, bugs, PRs, discussions. > >> >> >> > >> >> >> If this happens a lot, then this is I think even equally > offensive (or > >> >> >> even more) as closing stale issues. I think out of many stats, > >> >> >> "average response time" to an issue is absolutely most important > to > >> >> >> see how good the community is in handling issues and PRs. This > should > >> >> >> be to both - new issues and PRs. but also to issues that have been > >> >> >> opened and not responded after the user provided a response back. > >> >> >> > >> >> >> Now - many of those are not "intentional" and absolutely no "bad > will" > >> >> >> - and it is mostly because we do not realise that we have a > problem. > >> >> >> > >> >> >> We are all humans and have our daily issues and jobs and a lot of > what > >> >> >> we do for our issues is done in our free time. But maybe we can > >> >> >> automate and improve that part - which in turn will make our > stale bot > >> >> >> far "nicer" as it will only have to deal with the case where the > >> >> >> "user" has not provided necessary input and the maintainers > looked and > >> >> >> responded to it. > >> >> >> > >> >> >> I do not have a very concrete proposal, but some vague ideas how > this > >> >> >> could be improved: > >> >> >> > >> >> >> * Maybe we should start with building some simple stats and > seeing our > >> >> >> responsiveness and find out if we really have a problem there. I > am > >> >> >> sure there must be some tools for that and we might write ours if > >> >> >> needed - I remember we discussed similar issues in the past > >> >> >> > >> >> >> * Then maybe we can figure out a way to share the burden of > reviews > >> >> >> between more committers somehow. For example identify issues and > PRs > >> >> >> that have not been responded or followed up for some time and make > >> >> >> some way to incentivise and involve committers to provide > feedback to > >> >> >> those > >> >> >> > >> >> >> * The stats could help us to understand if we are falling behind > and > >> >> >> maybe we could have some weekly summary of stats that would help > us > >> >> >> with understanding if we should do something and up-end our > efforts in > >> >> >> triaging > >> >> >> > >> >> >> I think - if we do that then the only thing that Stale bot will be > >> >> >> doing is closing issues and PRs which had not received an input > from > >> >> >> the user. Which is perfectly fine IMHO. > >> >> >> > >> >> >> On Sun, Feb 12, 2023 at 10:56 AM Ash Berlin-Taylor < > a...@apache.org> wrote: > >> >> >> > > >> >> >> > Got it, yes that makes sense to me! > >> >> >> > > >> >> >> > On 12 February 2023 09:36:44 GMT, Elad Kalif < > elad...@apache.org> wrote: > >> >> >> >> > >> >> >> >> Thanks for the comments my replies are in blue for all points > raised. > >> >> >> >> > >> >> >> >> > We have currently more than 700 issues and many of them have > had no activity since a year. What will we do with those issues? > >> >> >> >> > >> >> >> >> Half of the open issues are feature requests thus will not be > impacted. The thing I'm trying to resolve here is to know if the old issue > is still reproducible on main/latest version. If so the issue will be > tagged appropriately and will be kept open if the author does not respond. > We can assume the issue is no longer relevant and close it. > >> >> >> >> > >> >> >> >> > Why close only stale issues not stale PR's? > >> >> >> >> > >> >> >> >> We already have that. Stale bot works for PR (excluding ones > with pinned label) > >> >> >> >> > >> >> >> >> > There is nothing I find more infuriating and demoralising > when dealing with an open source project (and big ones like Kubernetes are > the worst offenders at this) where I find a bug or feature request is > closed simply due to lack of traction. > >> >> >> >> > >> >> >> >> I understand and share your concerns. First, this suggestion > is just about bugs not about features. The automation calls for action from > the author to recheck the issue. > >> >> >> >> This is something I'm doing today manually by going issue by > issue and commenting the exact same thing "Is this issue happens in latest > airflow version?" The auto close part is something that happens today when > we add the pending-response label. My goal is to make sure that the list of > open bugs we have is relevant. I'm not against larger intervals should we > decide for it. To clarify I'm not suggesting to close bug reports because > lack of attraction I'm suggesting to close reports that are not on recent > versions of Airflow. In practice I don't see people trying to reproduce > bugs reported on 2.0 in latest main - this simply doesn't happen so by > having this process we are asking the author to recheck his report. If the > issue is still reproducible then by letting us know that and by having the > proper labels it might get more attraction to it. > >> >> >> >> > >> >> >> >> > >> >> >> >> On Sun, Feb 12, 2023 at 10:15 AM Ash Berlin-Taylor < > a...@apache.org> wrote: > >> >> >> >>> > >> >> >> >>> I feel very strongly against automated closing of _issues_. > >> >> >> >>> > >> >> >> >>> There is nothing I find more infuriating and demoralising > when dealing with an open source project (and big ones like Kubernetes are > the worst offenders at this) where I find a bug or feature request is > closed simply due to lack of traction. > >> >> >> >>> > >> >> >> >>> I might be okay with a very long time (such as stale after 1 > year and close another year after that.) > >> >> >> >>> > >> >> >> >>> Ash > >> >> >> >>> > >> >> >> >>> On 12 February 2023 02:00:00 GMT, Pankaj Singh < > ags.pankaj1...@gmail.com> wrote: > >> >> >> >>>> > >> >> >> >>>> Hi Elad, > >> >> >> >>>> > >> >> >> >>>> Thanks for bringing this topic. > >> >> >> >>>> > >> >> >> >>>> I also feel we should have some automation to close the > stale issue. > >> >> >> >>>> > >> >> >> >>>> Few questions I have > >> >> >> >>>> - We have currently more than 700 issues and many of them > have had no activity since a year. What will we do with those issues? > >> >> >> >>>> - Why close only stale issues not stale PR's? > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> On Sun, Feb 12, 2023 at 1:23 AM Elad Kalif < > elad...@apache.org> wrote: > >> >> >> >>>>> > >> >> >> >>>>> Hi everyone, > >> >> >> >>>>> > >> >> >> >>>>> It's been a while since we talked about the issue triage > process. Currently our process involves a lot of manual work of pinging > issue authors and I'm looking to automate some of it. > >> >> >> >>>>> > >> >> >> >>>>> Here are my suggestions: > >> >> >> >>>>> > >> >> >> >>>>> 1. add a new bot automation to detect core bug issues > (kind:bug, area:code) that are over 1 year old without any activity. The > bot will add a comment asking the user to check the issue against the > latest Airflow version and assign a "pending-response" label. If the user > will not respond the issue will be marked stale and will be closed by our > current stale bot automation. I suggest 1 year here because in 1 year we > usually have 3 feature releases + many bug fixes which contain a lot of > fixes. We don't normally go back to check bugs on older versions unless > reporting as reproducible on the latest version. There can be 2 outcomes of > this: > >> >> >> >>>>> > >> >> >> >>>>> The author will comment and say it is reproducible in that > case we will assign the updated affected_version label and the issue will > be bumped up. > >> >> >> >>>>> The author will not comment. In that case we can assume the > problem is fixed/not relevant and the issue will be closed. > >> >> >> >>>>> > >> >> >> >>>>> 2. similar to (1) for providers with labels (kind:bug, > area:provider) and with a shortened time period of 6 months as providers > release frequently. > >> >> >> >>>>> > >> >> >> >>>>> 3. similar to (1) for airflow-client-python and > airflow-client-go with no labels and period of 6 months as well. > >> >> >> >>>>> > >> >> >> >>>>> 4. On another front, we sometimes miss the triage of new > issues. My suggestion is that any new issue opened will automatically have > a needs-triage label (this is practice several other projects use) That way > we can easily filter the list of issues that need first review. When > triaging the issue we will remove the label and assign proper ones (good > first issue, area, kind, etc..) > >> >> >> >>>>> > >> >> >> >>>>> What do others think? > >> >> >> >>>>> > >> >> >> >>>>> Elad > >> >> >> >>>>> > >> >> >> >>>>> >