Re: Visibility of disabled tests

Geoffrey Brown Sat, 11 Jan 2020 07:38:52 -0800

Thanks Johann. I agree it is important that we try to fix tests that have
been disabled. I think the sheriffs usually needinfo the triage owner
before/when disabling a test; I'm disappointed to hear that isn't happening
consistently.


However, I'd prefer not to change the review process for the disabling
patch. Currently sheriffs normally request review from
#intermittent-reviewers and that has been working well:
 - we strive for very low latency so that frequently failing tests can be
addressed right away
 - we watch for common errors in test manifests
 - we can help ensure consistency in the test disabling procedure.

Keep in mind that sheriffs also needinfo (typically the triage owner) when
a test is identified as "needswork", failing frequently but not yet at the
disabling threshold. Often those needinfo requests go unanswered or fail to
resolve the issue (no shaming here: we are all busy and have priorities). I
think that requesting review from test author/triage owner/component peer
risks adding another 2 day delay to the overall process -- more time where
those tests are failing.

Instead of changing the reviewers, how about:
 - we remind the sheriffs to needinfo
 - #intermittent-reviewers check that needinfo is in place when reviewing
disabling patches.

It might be helpful if we explicitly consider some special cases. If the
sheriffs have needinfo'd for "needswork" and that needinfo has been
cleared, do we  want to set needinfo again when disabling? Always? If the
triage owner has a huge needinfo queue, still needinfo? ...


Regarding regression finding, as I understand it, sheriffs currently look
for regression ranges for bugs where:
 - the test is failing frequently: since these are easier to verify
pass/fail on any push
 - the test was running reliably in the near past.
In my experience, a comment on the bug requesting a regression range can be
effective. I don't know if the sheriffs have much time for additional
regression searches.

 - Geoff

On Tue, Jan 7, 2020 at 6:29 AM Johann Hofmann <jhofm...@mozilla.com> wrote:

> Hi folks,
>
> in the past I and other triage owners have experienced some frequently
> failing tests being disabled without a clear notice to the triage owner,
> component owner or test author. I've seen this specific pattern a few times:
>
> - An intermittent test starts failing very frequently very suddenly.
> - The Stockwell team reacts quickly (which is good) and disables the test,
> getting review from another sheriff or member of their team.
> - No analysis is done on the possible cause or regressing bug
> - The intermittent bug is left open without needinfo to anyone who could
> fix the test (some even with a P5 priority).
>
> This is problematic, since a) we're losing test coverage that way and b)
> these tests might be failing frequently because there's actually something
> wrong with the feature, not just a test issue.
>
> In most cases these get discovered sooner or later so I don't want to make
> this issue bigger than it is, but it's still suboptimal for some of us. It
> seems like we could easily remedy this by introducing a policy like:
>
> *For disabling tests, review from the test author, triage owner or a
> component peer is required. If they do not respond within 2? business days
> or if the frequency is higher than x, the test may be disabled without
> their consent, but the triage owner *must* be needinfo'd on such a bug in
> this case.*
>
> It would also be extremely helpful if Sheriffs could post a possible
> regression range for the frequent intermittent when disabling, where
> possible (because I assume that's also the best time to do a regression
> range).
>
> Any thoughts?
>
> Cheers.
>
> Johann
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Visibility of disabled tests

Reply via email to