Re: [Discussion] Set further policies for triaging issues

Jarek Potiuk Mon, 13 Feb 2023 02:27:25 -0800

Fine for me to start this way :)


On Mon, Feb 13, 2023 at 10:56 AM Elad Kalif <elad...@gmail.com> wrote:
>
> 1) The committer/PMC/Triage member will remove the needs-triage label. This 
> is not really an additional step.
> We are already relabeling when we triage an issue. The removal of the label 
> doesn't have to happen on the first touchdown.
> Sometimes the triager doesn't have the full knowledge so tagging another 
> member of the community or needs to ask followup questions.
> In my perspective triage is done once issue is understood, reproducible and 
> just waiting for someone to pick it up (usually this is also the stage where 
> we add the good first issue / area labels)
>
> 2) Procedures take time to be fully adopted. From past experience eventually 
> everyone is aligned with new policies.
> Even if we get it wrong in specific places it's very easy to correct it. 
> Dashboard can be really nice.
>
> 3) There is not much we can do. The next step after triage is to open PR. 
> This depends on someone who will pick up the issue.
> We can measure time since creation/last action but also break by reported 
> version.
>
>
> On Mon, Feb 13, 2023 at 11:38 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>> Yes. I agree it is a good first step. Let's just not stop on that.
>> Once we have it, I think starting measuring "responsiveness" is
>> crucial.
>>
>> Also - even if it is the first, step, it has to be well defined.
>> Adding such labels should be accompanied with some way of explaining
>> and educating those who would use it how to deal with it. Because
>> setting it is one thing and important is what happens next. Few
>> questions:
>>
>> 1) Who and when should I remove it ? I believe it adds extra
>> responsibility on those who look at the issue and respond to the user,
>> to remove it when the issue has been "triaged" already - is that the
>> idea - to do it always as an extra manual step when we respond to such
>> issue (sounds like extra small but regular burden). Maybe the bot
>> could automatically remove the label when a maintainer responded to
>> the issue? We could do this later, but I am curious what you think
>> there.
>>
>> 2) Do you think we should have some "dashboard" showing the issues to
>> be triaged? Or you think just "label needs-triage" will be enough and
>> every one of the maintainers will know and should simply look at those
>> issues that "needs triage"?
>>
>> 3) what should we do about issues that have been triaged and the user
>> "responded" (and no-one will follow up - this happens). Are we going
>> to track them too or is it something to tackle next ?
>>
>> J.
>>
>>
>>
>> On Mon, Feb 13, 2023 at 10:02 AM Elad Kalif <elad...@apache.org> wrote:
>> >
>> > > Setting the label does not mean that someone will have eyes on it.
>> >
>> > True. but that is just about creating a work queue so when someone does 
>> > spend time on triage the issues can be found easily.
>> > This will also address your other points of needing data. By having the 
>> > label can measure several metrics regarding waiting for triage time 
>> > (script that checks open issues with the labels (daily?) and possibly push 
>> > notification to the issue-triage channel in slack or some other channel?)
>> >
>> > There are many further improvements we can do. For example setup 
>> > https://github.com/google/triage-party tool
>> >
>> > On Sun, Feb 12, 2023 at 10:29 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> >>
>> >> Maybe I am not getting the full extent of the proposal and maybe I
>> >> "hijacked" it a bit, but my comment was really related to (4) - new
>> >> issues. My mistake. Let me comment on those.
>> >>
>> >> 1) 2) 3)  -> This is good as a cleanup. We can do this as a manually
>> >> run bulk process to add such comments now for all such old issues and
>> >> run them periodically (every few months) if needed. That will be
>> >> simpler. I am fine with that. We can even initially run it manually
>> >> and if we find it useful we can turn it into a bot. But I think we
>> >> should not have to do it again and this is what I mostly commented on.
>> >>
>> >> > In item (4) I suggested adding a needs-triage label to make sure that 
>> >> > any issue we get will have at least 1 committer/PMC/Triage member eyes 
>> >> > on it.
>> >>
>> >> Setting the label does not mean that someone will have eyes on it.
>> >> It's a good starting point though. I think measuring and incentivising
>> >> responsiveness for new issues is a key. And if we make sure that we
>> >> respond to issues in a timely manner, the current stale-bot is enough.
>> >> It will be closing only issues/PRs which are "pending response". All
>> >> the other issues will be either acknowledged by a maintainer and at
>> >> least vaguely planned to work on, or converted to a discussion, or
>> >> fixed or marked as "good first issue" for anyone to pick up.
>> >>
>> >> J.
>> >>
>> >>
>> >>
>> >> On Sun, Feb 12, 2023 at 8:37 PM Elad Kalif <elad...@apache.org> wrote:
>> >> >
>> >> > I'm not sure if the scenario you are worried about can happen?
>> >> >
>> >> > In item (4) I suggested adding a needs-triage label to make sure that 
>> >> > any issue we get will have at least 1 committer/PMC/Triage member eyes 
>> >> > on it.
>> >> > In this step the issue can be accepted and then this label is replaced 
>> >> > by (reported_version) or rejected and be closed/converted to discussion.
>> >> > If a year passed and no one did anything with the issue the automation 
>> >> > will simply ask the user to let us know if the issue is still happening 
>> >> > on newer Airflow version.
>> >> > The issue may have already been solved and we just didn't notice. 
>> >> > Assuming the user won't comment in a defined time frame then it will 
>> >> > close the issue (if someone in the future will say we did wrong we can 
>> >> > always reopen)
>> >> > This is basically what happens today just by manually process.
>> >> >
>> >> > - What happens if the user replies that it's reproducible?
>> >> > We will replace the previous reported_version with a new one (for 
>> >> > example: reported_version:2.0 -> reported_version:2.5) this will bump 
>> >> > the issue to the latest bug lists.
>> >> >
>> >> >
>> >> > On Sun, Feb 12, 2023 at 5:48 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> >> >>
>> >> >> TL;DR; I think we should first solve the issue of improving our
>> >> >> "responsiveness" as committers first. I believe once we solve it, the
>> >> >> stale bot closing issue will be a useful and non-offensive tool.
>> >> >>
>> >> >> (Sorry for a loong email,  but I have been thinking a lot about it and
>> >> >> I had many observations that came from responding and triaging our
>> >> >> issues and PRs and discussions - this took likely 30%/40% of my time
>> >> >> over the last few months).
>> >> >>
>> >> >> Yes. I also have some serious doubts about closing issues "blindly" by
>> >> >> one criteria only by time of inactivity. I think this is just wrong
>> >> >> and we should not do it.
>> >> >>
>> >> >> I agree with Ash that this is really infuriating, when I opened an
>> >> >> issue, then 3 months have passed and it has been closed due to
>> >> >> inactivity. This is simply offensive - no matter if it's a bug or
>> >> >> issue or PR. But I think this is not a problem that we have to deal
>> >> >> better with the issues as maintainers, not that the stale bot is a
>> >> >> problem.
>> >> >>
>> >> >> But this is only one side of the story - If you have a stale issue
>> >> >> first and closed, when the issue is "pending response" from the user -
>> >> >> I have absolutely no problem with that. If the user who opened it is
>> >> >> asked for extra information and has not found a time to provide it -
>> >> >> there is absolutely no reason it should take the mental space of the
>> >> >> maintainers and we should close it automatically. We can always
>> >> >> re-open it if the user comes back with more information.
>> >> >>
>> >> >>
>> >> >> But coming back to closing issues without reaction from anyone. If
>> >> >> this kind of closing happens that we should be ashamed of that means
>> >> >> something else. That means that we as maintainers have done a bad job
>> >> >> in triaging this issue. This is really an indication that no-one -
>> >> >> neither regular contributors (which happens) nor maintainers (which
>> >> >> should look at it if no contributor does) found a time to read,
>> >> >> analyse and respond to an issue. No matter what response it will be.
>> >> >> ANY response from a maintainer (won'tfix, asking for more information,
>> >> >> asking others to provide more community to provide more evidence if
>> >> >> the issue is impossible to diagnose, convert to a discussion) is
>> >> >> better than silence. Way, way better.
>> >> >>
>> >> >> From my point of view, I think the real problem we have is that we
>> >> >> often have issues open for weeks or monhts without ANY interaction -
>> >> >> or there is no interaction after the user provided some kind of
>> >> >> response, additional information etc. Every now and then I do a
>> >> >> "streak" where I try to provide A response to EVERY issue and PRs
>> >> >> opened and not responded to for the last few weeks. And there are a
>> >> >> number of those for issues or PRs that are even 3-4 weeks without any
>> >> >> answer.
>> >> >>
>> >> >> And I am as guilty as everyone else here, but I have a feeling that if
>> >> >> we collectively as maintainers spend quite a good chunk of our time
>> >> >> triaging and responding to issues in due time. I think if we end up
>> >> >> with a situation where a user raises an issue or PR or provides a
>> >> >> feedback/new iteration etc. and there is absolutely no response for
>> >> >> more than a week - this is an indication we have a huge problem. And
>> >> >> the worst types of thoe are where someone "requests changes", those
>> >> >> changes are applied and the user pings the reviewer and there are
>> >> >> weeks of no response (even to multiple pings). Those happen rarely in
>> >> >> our and I think they are a bit even disrespectful to the users who had
>> >> >> "done their part".
>> >> >>
>> >> >> And I believe (I have no stats, just gut feeling) that we have that to
>> >> >> some extent - for features, bugs, PRs, discussions.
>> >> >>
>> >> >> If this happens a lot, then this is I think even equally offensive (or
>> >> >> even more) as closing stale issues. I think out of many stats,
>> >> >> "average response time" to an issue is absolutely most important to
>> >> >> see how good the community is in handling issues and PRs. This should
>> >> >> be to both - new issues and PRs. but also to issues that have been
>> >> >> opened and not responded after the user provided a response back.
>> >> >>
>> >> >> Now - many of those are not "intentional" and absolutely no "bad will"
>> >> >> - and it is mostly because we do not realise that we have a problem.
>> >> >>
>> >> >> We are all humans and have our daily issues and jobs and a lot of what
>> >> >> we do for our issues is done in our free time. But maybe we can
>> >> >> automate and improve that part - which in turn will make our stale bot
>> >> >> far "nicer" as it will only have to deal with the case where the
>> >> >> "user" has not provided necessary input and the maintainers looked and
>> >> >> responded to it.
>> >> >>
>> >> >> I do not have a very concrete proposal, but some vague ideas how this
>> >> >> could be improved:
>> >> >>
>> >> >> * Maybe we should start with building some simple stats and seeing our
>> >> >> responsiveness and find out if we really have a problem there. I am
>> >> >> sure there must be some tools for that and we might write ours if
>> >> >> needed - I remember we discussed similar issues in the past
>> >> >>
>> >> >> * Then maybe we can figure out a way to share the burden of reviews
>> >> >> between more committers somehow. For example identify issues and PRs
>> >> >> that have not been responded or followed up for some time and make
>> >> >> some way to incentivise and involve committers to provide feedback to
>> >> >> those
>> >> >>
>> >> >> * The stats could help us to understand if we are falling behind and
>> >> >> maybe we could have some weekly summary of stats that would help us
>> >> >> with understanding if we should do something and up-end our efforts in
>> >> >> triaging
>> >> >>
>> >> >> I think - if we do that then the only thing that Stale bot will be
>> >> >> doing is closing issues and PRs which had not received an input from
>> >> >> the user. Which is perfectly fine IMHO.
>> >> >>
>> >> >> On Sun, Feb 12, 2023 at 10:56 AM Ash Berlin-Taylor <a...@apache.org> 
>> >> >> wrote:
>> >> >> >
>> >> >> > Got it, yes that makes sense to me!
>> >> >> >
>> >> >> > On 12 February 2023 09:36:44 GMT, Elad Kalif <elad...@apache.org> 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Thanks for the comments my replies are in blue for all points 
>> >> >> >> raised.
>> >> >> >>
>> >> >> >> > We have currently more than 700 issues and many of them have had 
>> >> >> >> > no activity since a year. What will we do with those issues?
>> >> >> >>
>> >> >> >> Half of the open issues are feature requests thus will not be 
>> >> >> >> impacted. The thing I'm trying to resolve here is to know if the 
>> >> >> >> old issue is still reproducible on main/latest version. If so the 
>> >> >> >> issue will be tagged appropriately and will be kept open if the 
>> >> >> >> author does not respond. We can assume the issue is no longer 
>> >> >> >> relevant and close it.
>> >> >> >>
>> >> >> >> > Why close only stale issues not stale PR's?
>> >> >> >>
>> >> >> >> We already have that. Stale bot works for PR (excluding ones with 
>> >> >> >> pinned label)
>> >> >> >>
>> >> >> >> > There is nothing I find more infuriating and demoralising when 
>> >> >> >> > dealing with an open source project (and big ones like Kubernetes 
>> >> >> >> > are the worst offenders at this) where I find a bug or feature 
>> >> >> >> > request is closed simply due to lack of traction.
>> >> >> >>
>> >> >> >> I understand and share your concerns. First, this suggestion is 
>> >> >> >> just about bugs not about features. The automation calls for action 
>> >> >> >> from the author to recheck the issue.
>> >> >> >> This is something I'm doing today manually by going issue by issue 
>> >> >> >> and commenting the exact same thing "Is this issue happens in 
>> >> >> >> latest airflow version?" The auto close part is something that 
>> >> >> >> happens today when we add the pending-response label. My goal is to 
>> >> >> >> make sure that the list of open bugs we have is relevant. I'm not 
>> >> >> >> against larger intervals should we decide for it. To clarify I'm 
>> >> >> >> not suggesting to close bug reports because lack of attraction I'm 
>> >> >> >> suggesting to close reports that are not on recent versions of 
>> >> >> >> Airflow. In practice I don't see people trying to reproduce bugs 
>> >> >> >> reported on 2.0 in latest main - this simply doesn't happen so by 
>> >> >> >> having this process we are asking the author to recheck his report. 
>> >> >> >> If the issue is still reproducible then by letting us know that and 
>> >> >> >> by having the proper labels it might get more attraction to it.
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sun, Feb 12, 2023 at 10:15 AM Ash Berlin-Taylor 
>> >> >> >> <a...@apache.org> wrote:
>> >> >> >>>
>> >> >> >>> I feel very strongly against automated closing of _issues_.
>> >> >> >>>
>> >> >> >>> There is nothing I find more infuriating and demoralising when 
>> >> >> >>> dealing with an open source project (and big ones like Kubernetes 
>> >> >> >>> are the worst offenders at this) where I find a bug or feature 
>> >> >> >>> request is closed simply due to lack of traction.
>> >> >> >>>
>> >> >> >>> I might be okay with a very long time (such as stale after 1 year 
>> >> >> >>> and close another year after that.)
>> >> >> >>>
>> >> >> >>> Ash
>> >> >> >>>
>> >> >> >>> On 12 February 2023 02:00:00 GMT, Pankaj Singh 
>> >> >> >>> <ags.pankaj1...@gmail.com> wrote:
>> >> >> >>>>
>> >> >> >>>> Hi Elad,
>> >> >> >>>>
>> >> >> >>>> Thanks for bringing this topic.
>> >> >> >>>>
>> >> >> >>>> I also feel we should have some automation to close the stale 
>> >> >> >>>> issue.
>> >> >> >>>>
>> >> >> >>>> Few questions I have
>> >> >> >>>> - We have currently more than 700 issues and many of them have 
>> >> >> >>>> had no activity since a year. What will we do with those issues?
>> >> >> >>>> - Why close only stale issues not stale PR's?
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Sun, Feb 12, 2023 at 1:23 AM Elad Kalif <elad...@apache.org> 
>> >> >> >>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>> Hi everyone,
>> >> >> >>>>>
>> >> >> >>>>> It's been a while since we talked about the issue triage 
>> >> >> >>>>> process. Currently our process involves a lot of manual work of 
>> >> >> >>>>> pinging issue authors and I'm looking to automate some of it.
>> >> >> >>>>>
>> >> >> >>>>> Here are my suggestions:
>> >> >> >>>>>
>> >> >> >>>>> 1. add a new bot automation to detect core bug issues (kind:bug, 
>> >> >> >>>>> area:code) that are over 1 year old without any activity. The 
>> >> >> >>>>> bot will add a comment asking the user to check the issue 
>> >> >> >>>>> against the latest Airflow version and assign a 
>> >> >> >>>>> "pending-response" label. If the user will not respond the issue 
>> >> >> >>>>> will be marked stale and will be closed by our current stale bot 
>> >> >> >>>>> automation. I suggest 1 year here because in 1 year we usually 
>> >> >> >>>>> have 3 feature releases + many bug fixes which contain a lot of 
>> >> >> >>>>> fixes. We don't normally go back to check bugs on older versions 
>> >> >> >>>>> unless reporting as reproducible on the latest version. There 
>> >> >> >>>>> can be 2 outcomes of this:
>> >> >> >>>>>
>> >> >> >>>>> The author will comment and say it is reproducible in that case 
>> >> >> >>>>> we will assign the updated affected_version label and the issue 
>> >> >> >>>>> will be bumped up.
>> >> >> >>>>> The author will not comment. In that case we can assume the 
>> >> >> >>>>> problem is fixed/not relevant and the issue will be closed.
>> >> >> >>>>>
>> >> >> >>>>> 2. similar to (1) for providers with labels (kind:bug, 
>> >> >> >>>>> area:provider) and with a shortened time period of 6 months as 
>> >> >> >>>>> providers release frequently.
>> >> >> >>>>>
>> >> >> >>>>> 3. similar to (1) for airflow-client-python and 
>> >> >> >>>>> airflow-client-go with no labels and period of 6 months as well.
>> >> >> >>>>>
>> >> >> >>>>> 4. On another front, we sometimes miss the triage of new issues. 
>> >> >> >>>>> My suggestion is that any new issue opened will automatically 
>> >> >> >>>>> have a needs-triage label (this is practice several other 
>> >> >> >>>>> projects use) That way we can easily filter the list of issues 
>> >> >> >>>>> that need first review. When triaging the issue we will remove 
>> >> >> >>>>> the label and assign proper ones (good first issue, area, kind, 
>> >> >> >>>>> etc..)
>> >> >> >>>>>
>> >> >> >>>>> What do others think?
>> >> >> >>>>>
>> >> >> >>>>> Elad
>> >> >> >>>>>
>> >> >> >>>>>

Re: [Discussion] Set further policies for triaging issues

Reply via email to