Hi folks,

in the past I and other triage owners have experienced some frequently
failing tests being disabled without a clear notice to the triage owner,
component owner or test author. I've seen this specific pattern a few times:

- An intermittent test starts failing very frequently very suddenly.
- The Stockwell team reacts quickly (which is good) and disables the test,
getting review from another sheriff or member of their team.
- No analysis is done on the possible cause or regressing bug
- The intermittent bug is left open without needinfo to anyone who could
fix the test (some even with a P5 priority).

This is problematic, since a) we're losing test coverage that way and b)
these tests might be failing frequently because there's actually something
wrong with the feature, not just a test issue.

In most cases these get discovered sooner or later so I don't want to make
this issue bigger than it is, but it's still suboptimal for some of us. It
seems like we could easily remedy this by introducing a policy like:

*For disabling tests, review from the test author, triage owner or a
component peer is required. If they do not respond within 2? business days
or if the frequency is higher than x, the test may be disabled without
their consent, but the triage owner *must* be needinfo'd on such a bug in
this case.*

It would also be extremely helpful if Sheriffs could post a possible
regression range for the frequent intermittent when disabling, where
possible (because I assume that's also the best time to do a regression
range).

Any thoughts?

Cheers.

Johann
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to