Re: [Discussion] Set further policies for triaging issues

Elad Kalif Wed, 15 Feb 2023 05:57:03 -0800

Cool.

So it seems like we have an agreement on the concept.
lets review the details/specific concerns in the PR
https://github.com/apache/airflow/pull/29554



On Mon, Feb 13, 2023 at 12:27 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Fine for me to start this way :)
>
> On Mon, Feb 13, 2023 at 10:56 AM Elad Kalif <elad...@gmail.com> wrote:
> >
> > 1) The committer/PMC/Triage member will remove the needs-triage label.
> This is not really an additional step.
> > We are already relabeling when we triage an issue. The removal of the
> label doesn't have to happen on the first touchdown.
> > Sometimes the triager doesn't have the full knowledge so tagging another
> member of the community or needs to ask followup questions.
> > In my perspective triage is done once issue is understood, reproducible
> and just waiting for someone to pick it up (usually this is also the stage
> where we add the good first issue / area labels)
> >
> > 2) Procedures take time to be fully adopted. From past experience
> eventually everyone is aligned with new policies.
> > Even if we get it wrong in specific places it's very easy to correct it.
> Dashboard can be really nice.
> >
> > 3) There is not much we can do. The next step after triage is to open
> PR. This depends on someone who will pick up the issue.
> > We can measure time since creation/last action but also break by
> reported version.
> >
> >
> > On Mon, Feb 13, 2023 at 11:38 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> Yes. I agree it is a good first step. Let's just not stop on that.
> >> Once we have it, I think starting measuring "responsiveness" is
> >> crucial.
> >>
> >> Also - even if it is the first, step, it has to be well defined.
> >> Adding such labels should be accompanied with some way of explaining
> >> and educating those who would use it how to deal with it. Because
> >> setting it is one thing and important is what happens next. Few
> >> questions:
> >>
> >> 1) Who and when should I remove it ? I believe it adds extra
> >> responsibility on those who look at the issue and respond to the user,
> >> to remove it when the issue has been "triaged" already - is that the
> >> idea - to do it always as an extra manual step when we respond to such
> >> issue (sounds like extra small but regular burden). Maybe the bot
> >> could automatically remove the label when a maintainer responded to
> >> the issue? We could do this later, but I am curious what you think
> >> there.
> >>
> >> 2) Do you think we should have some "dashboard" showing the issues to
> >> be triaged? Or you think just "label needs-triage" will be enough and
> >> every one of the maintainers will know and should simply look at those
> >> issues that "needs triage"?
> >>
> >> 3) what should we do about issues that have been triaged and the user
> >> "responded" (and no-one will follow up - this happens). Are we going
> >> to track them too or is it something to tackle next ?
> >>
> >> J.
> >>
> >>
> >>
> >> On Mon, Feb 13, 2023 at 10:02 AM Elad Kalif <elad...@apache.org> wrote:
> >> >
> >> > > Setting the label does not mean that someone will have eyes on it.
> >> >
> >> > True. but that is just about creating a work queue so when someone
> does spend time on triage the issues can be found easily.
> >> > This will also address your other points of needing data. By having
> the label can measure several metrics regarding waiting for triage time
> (script that checks open issues with the labels (daily?) and possibly push
> notification to the issue-triage channel in slack or some other channel?)
> >> >
> >> > There are many further improvements we can do. For example setup
> https://github.com/google/triage-party tool
> >> >
> >> > On Sun, Feb 12, 2023 at 10:29 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >> >>
> >> >> Maybe I am not getting the full extent of the proposal and maybe I
> >> >> "hijacked" it a bit, but my comment was really related to (4) - new
> >> >> issues. My mistake. Let me comment on those.
> >> >>
> >> >> 1) 2) 3)  -> This is good as a cleanup. We can do this as a manually
> >> >> run bulk process to add such comments now for all such old issues and
> >> >> run them periodically (every few months) if needed. That will be
> >> >> simpler. I am fine with that. We can even initially run it manually
> >> >> and if we find it useful we can turn it into a bot. But I think we
> >> >> should not have to do it again and this is what I mostly commented
> on.
> >> >>
> >> >> > In item (4) I suggested adding a needs-triage label to make sure
> that any issue we get will have at least 1 committer/PMC/Triage member eyes
> on it.
> >> >>
> >> >> Setting the label does not mean that someone will have eyes on it.
> >> >> It's a good starting point though. I think measuring and
> incentivising
> >> >> responsiveness for new issues is a key. And if we make sure that we
> >> >> respond to issues in a timely manner, the current stale-bot is
> enough.
> >> >> It will be closing only issues/PRs which are "pending response". All
> >> >> the other issues will be either acknowledged by a maintainer and at
> >> >> least vaguely planned to work on, or converted to a discussion, or
> >> >> fixed or marked as "good first issue" for anyone to pick up.
> >> >>
> >> >> J.
> >> >>
> >> >>
> >> >>
> >> >> On Sun, Feb 12, 2023 at 8:37 PM Elad Kalif <elad...@apache.org>
> wrote:
> >> >> >
> >> >> > I'm not sure if the scenario you are worried about can happen?
> >> >> >
> >> >> > In item (4) I suggested adding a needs-triage label to make sure
> that any issue we get will have at least 1 committer/PMC/Triage member eyes
> on it.
> >> >> > In this step the issue can be accepted and then this label is
> replaced by (reported_version) or rejected and be closed/converted to
> discussion.
> >> >> > If a year passed and no one did anything with the issue the
> automation will simply ask the user to let us know if the issue is still
> happening on newer Airflow version.
> >> >> > The issue may have already been solved and we just didn't notice.
> Assuming the user won't comment in a defined time frame then it will close
> the issue (if someone in the future will say we did wrong we can always
> reopen)
> >> >> > This is basically what happens today just by manually process.
> >> >> >
> >> >> > - What happens if the user replies that it's reproducible?
> >> >> > We will replace the previous reported_version with a new one (for
> example: reported_version:2.0 -> reported_version:2.5) this will bump the
> issue to the latest bug lists.
> >> >> >
> >> >> >
> >> >> > On Sun, Feb 12, 2023 at 5:48 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >> >> >>
> >> >> >> TL;DR; I think we should first solve the issue of improving our
> >> >> >> "responsiveness" as committers first. I believe once we solve it,
> the
> >> >> >> stale bot closing issue will be a useful and non-offensive tool.
> >> >> >>
> >> >> >> (Sorry for a loong email,  but I have been thinking a lot about
> it and
> >> >> >> I had many observations that came from responding and triaging our
> >> >> >> issues and PRs and discussions - this took likely 30%/40% of my
> time
> >> >> >> over the last few months).
> >> >> >>
> >> >> >> Yes. I also have some serious doubts about closing issues
> "blindly" by
> >> >> >> one criteria only by time of inactivity. I think this is just
> wrong
> >> >> >> and we should not do it.
> >> >> >>
> >> >> >> I agree with Ash that this is really infuriating, when I opened an
> >> >> >> issue, then 3 months have passed and it has been closed due to
> >> >> >> inactivity. This is simply offensive - no matter if it's a bug or
> >> >> >> issue or PR. But I think this is not a problem that we have to
> deal
> >> >> >> better with the issues as maintainers, not that the stale bot is a
> >> >> >> problem.
> >> >> >>
> >> >> >> But this is only one side of the story - If you have a stale issue
> >> >> >> first and closed, when the issue is "pending response" from the
> user -
> >> >> >> I have absolutely no problem with that. If the user who opened it
> is
> >> >> >> asked for extra information and has not found a time to provide
> it -
> >> >> >> there is absolutely no reason it should take the mental space of
> the
> >> >> >> maintainers and we should close it automatically. We can always
> >> >> >> re-open it if the user comes back with more information.
> >> >> >>
> >> >> >>
> >> >> >> But coming back to closing issues without reaction from anyone. If
> >> >> >> this kind of closing happens that we should be ashamed of that
> means
> >> >> >> something else. That means that we as maintainers have done a bad
> job
> >> >> >> in triaging this issue. This is really an indication that no-one -
> >> >> >> neither regular contributors (which happens) nor maintainers
> (which
> >> >> >> should look at it if no contributor does) found a time to read,
> >> >> >> analyse and respond to an issue. No matter what response it will
> be.
> >> >> >> ANY response from a maintainer (won'tfix, asking for more
> information,
> >> >> >> asking others to provide more community to provide more evidence
> if
> >> >> >> the issue is impossible to diagnose, convert to a discussion) is
> >> >> >> better than silence. Way, way better.
> >> >> >>
> >> >> >> From my point of view, I think the real problem we have is that we
> >> >> >> often have issues open for weeks or monhts without ANY
> interaction -
> >> >> >> or there is no interaction after the user provided some kind of
> >> >> >> response, additional information etc. Every now and then I do a
> >> >> >> "streak" where I try to provide A response to EVERY issue and PRs
> >> >> >> opened and not responded to for the last few weeks. And there are
> a
> >> >> >> number of those for issues or PRs that are even 3-4 weeks without
> any
> >> >> >> answer.
> >> >> >>
> >> >> >> And I am as guilty as everyone else here, but I have a feeling
> that if
> >> >> >> we collectively as maintainers spend quite a good chunk of our
> time
> >> >> >> triaging and responding to issues in due time. I think if we end
> up
> >> >> >> with a situation where a user raises an issue or PR or provides a
> >> >> >> feedback/new iteration etc. and there is absolutely no response
> for
> >> >> >> more than a week - this is an indication we have a huge problem.
> And
> >> >> >> the worst types of thoe are where someone "requests changes",
> those
> >> >> >> changes are applied and the user pings the reviewer and there are
> >> >> >> weeks of no response (even to multiple pings). Those happen
> rarely in
> >> >> >> our and I think they are a bit even disrespectful to the users
> who had
> >> >> >> "done their part".
> >> >> >>
> >> >> >> And I believe (I have no stats, just gut feeling) that we have
> that to
> >> >> >> some extent - for features, bugs, PRs, discussions.
> >> >> >>
> >> >> >> If this happens a lot, then this is I think even equally
> offensive (or
> >> >> >> even more) as closing stale issues. I think out of many stats,
> >> >> >> "average response time" to an issue is absolutely most important
> to
> >> >> >> see how good the community is in handling issues and PRs. This
> should
> >> >> >> be to both - new issues and PRs. but also to issues that have been
> >> >> >> opened and not responded after the user provided a response back.
> >> >> >>
> >> >> >> Now - many of those are not "intentional" and absolutely no "bad
> will"
> >> >> >> - and it is mostly because we do not realise that we have a
> problem.
> >> >> >>
> >> >> >> We are all humans and have our daily issues and jobs and a lot of
> what
> >> >> >> we do for our issues is done in our free time. But maybe we can
> >> >> >> automate and improve that part - which in turn will make our
> stale bot
> >> >> >> far "nicer" as it will only have to deal with the case where the
> >> >> >> "user" has not provided necessary input and the maintainers
> looked and
> >> >> >> responded to it.
> >> >> >>
> >> >> >> I do not have a very concrete proposal, but some vague ideas how
> this
> >> >> >> could be improved:
> >> >> >>
> >> >> >> * Maybe we should start with building some simple stats and
> seeing our
> >> >> >> responsiveness and find out if we really have a problem there. I
> am
> >> >> >> sure there must be some tools for that and we might write ours if
> >> >> >> needed - I remember we discussed similar issues in the past
> >> >> >>
> >> >> >> * Then maybe we can figure out a way to share the burden of
> reviews
> >> >> >> between more committers somehow. For example identify issues and
> PRs
> >> >> >> that have not been responded or followed up for some time and make
> >> >> >> some way to incentivise and involve committers to provide
> feedback to
> >> >> >> those
> >> >> >>
> >> >> >> * The stats could help us to understand if we are falling behind
> and
> >> >> >> maybe we could have some weekly summary of stats that would help
> us
> >> >> >> with understanding if we should do something and up-end our
> efforts in
> >> >> >> triaging
> >> >> >>
> >> >> >> I think - if we do that then the only thing that Stale bot will be
> >> >> >> doing is closing issues and PRs which had not received an input
> from
> >> >> >> the user. Which is perfectly fine IMHO.
> >> >> >>
> >> >> >> On Sun, Feb 12, 2023 at 10:56 AM Ash Berlin-Taylor <
> a...@apache.org> wrote:
> >> >> >> >
> >> >> >> > Got it, yes that makes sense to me!
> >> >> >> >
> >> >> >> > On 12 February 2023 09:36:44 GMT, Elad Kalif <
> elad...@apache.org> wrote:
> >> >> >> >>
> >> >> >> >> Thanks for the comments my replies are in blue for all points
> raised.
> >> >> >> >>
> >> >> >> >> > We have currently more than 700 issues and many of them have
> had no activity since a year. What will we do with those issues?
> >> >> >> >>
> >> >> >> >> Half of the open issues are feature requests thus will not be
> impacted. The thing I'm trying to resolve here is to know if the old issue
> is still reproducible on main/latest version. If so the issue will be
> tagged appropriately and will be kept open if the author does not respond.
> We can assume the issue is no longer relevant and close it.
> >> >> >> >>
> >> >> >> >> > Why close only stale issues not stale PR's?
> >> >> >> >>
> >> >> >> >> We already have that. Stale bot works for PR (excluding ones
> with pinned label)
> >> >> >> >>
> >> >> >> >> > There is nothing I find more infuriating and demoralising
> when dealing with an open source project (and big ones like Kubernetes are
> the worst offenders at this) where I find a bug or feature request is
> closed simply due to lack of traction.
> >> >> >> >>
> >> >> >> >> I understand and share your concerns. First, this suggestion
> is just about bugs not about features. The automation calls for action from
> the author to recheck the issue.
> >> >> >> >> This is something I'm doing today manually by going issue by
> issue and commenting the exact same thing "Is this issue happens in latest
> airflow version?" The auto close part is something that happens today when
> we add the pending-response label. My goal is to make sure that the list of
> open bugs we have is relevant. I'm not against larger intervals should we
> decide for it. To clarify I'm not suggesting to close bug reports because
> lack of attraction I'm suggesting to close reports that are not on recent
> versions of Airflow. In practice I don't see people trying to reproduce
> bugs reported on 2.0 in latest main - this simply doesn't happen so by
> having this process we are asking the author to recheck his report. If the
> issue is still reproducible then by letting us know that and by having the
> proper labels it might get more attraction to it.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Sun, Feb 12, 2023 at 10:15 AM Ash Berlin-Taylor <
> a...@apache.org> wrote:
> >> >> >> >>>
> >> >> >> >>> I feel very strongly against automated closing of _issues_.
> >> >> >> >>>
> >> >> >> >>> There is nothing I find more infuriating and demoralising
> when dealing with an open source project (and big ones like Kubernetes are
> the worst offenders at this) where I find a bug or feature request is
> closed simply due to lack of traction.
> >> >> >> >>>
> >> >> >> >>> I might be okay with a very long time (such as stale after 1
> year and close another year after that.)
> >> >> >> >>>
> >> >> >> >>> Ash
> >> >> >> >>>
> >> >> >> >>> On 12 February 2023 02:00:00 GMT, Pankaj Singh <
> ags.pankaj1...@gmail.com> wrote:
> >> >> >> >>>>
> >> >> >> >>>> Hi Elad,
> >> >> >> >>>>
> >> >> >> >>>> Thanks for bringing this topic.
> >> >> >> >>>>
> >> >> >> >>>> I also feel we should have some automation to close the
> stale issue.
> >> >> >> >>>>
> >> >> >> >>>> Few questions I have
> >> >> >> >>>> - We have currently more than 700 issues and many of them
> have had no activity since a year. What will we do with those issues?
> >> >> >> >>>> - Why close only stale issues not stale PR's?
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Sun, Feb 12, 2023 at 1:23 AM Elad Kalif <
> elad...@apache.org> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>> Hi everyone,
> >> >> >> >>>>>
> >> >> >> >>>>> It's been a while since we talked about the issue triage
> process. Currently our process involves a lot of manual work of pinging
> issue authors and I'm looking to automate some of it.
> >> >> >> >>>>>
> >> >> >> >>>>> Here are my suggestions:
> >> >> >> >>>>>
> >> >> >> >>>>> 1. add a new bot automation to detect core bug issues
> (kind:bug, area:code) that are over 1 year old without any activity. The
> bot will add a comment asking the user to check the issue against the
> latest Airflow version and assign a "pending-response" label. If the user
> will not respond the issue will be marked stale and will be closed by our
> current stale bot automation. I suggest 1 year here because in 1 year we
> usually have 3 feature releases + many bug fixes which contain a lot of
> fixes. We don't normally go back to check bugs on older versions unless
> reporting as reproducible on the latest version. There can be 2 outcomes of
> this:
> >> >> >> >>>>>
> >> >> >> >>>>> The author will comment and say it is reproducible in that
> case we will assign the updated affected_version label and the issue will
> be bumped up.
> >> >> >> >>>>> The author will not comment. In that case we can assume the
> problem is fixed/not relevant and the issue will be closed.
> >> >> >> >>>>>
> >> >> >> >>>>> 2. similar to (1) for providers with labels (kind:bug,
> area:provider) and with a shortened time period of 6 months as providers
> release frequently.
> >> >> >> >>>>>
> >> >> >> >>>>> 3. similar to (1) for airflow-client-python and
> airflow-client-go with no labels and period of 6 months as well.
> >> >> >> >>>>>
> >> >> >> >>>>> 4. On another front, we sometimes miss the triage of new
> issues. My suggestion is that any new issue opened will automatically have
> a needs-triage label (this is practice several other projects use) That way
> we can easily filter the list of issues that need first review. When
> triaging the issue we will remove the label and assign proper ones (good
> first issue, area, kind, etc..)
> >> >> >> >>>>>
> >> >> >> >>>>> What do others think?
> >> >> >> >>>>>
> >> >> >> >>>>> Elad
> >> >> >> >>>>>
> >> >> >> >>>>>
>

Re: [Discussion] Set further policies for triaging issues

Reply via email to