On 2014-04-04, 3:12 PM, L. David Baron wrote:
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
the time, it is time to escalate.
Escalation path:
1) Ensure we have a bug on file, with the test author, reviewer, module owner,
and any other interested parties, links to logs, etc.
2) We need to needinfo? and expect a response within 2 business days, this
should be clear in a comment.
3) In the case we don't get a response, request a needinfo? from the module
owner
with the expectation of 2 days for a response and getting someone to take
action.
4) In the case we go another 2 days with no response from a module owner, we
will disable the test.
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the regressing patch,
and the regressing patch should get backed out instead of disabling
the test.
We have no good way of identifying the regressing patch in many cases.
Ideally we will work with the test author to either get the test fixed or
disabled depending on available time or difficulty in fixing the test.
This is intended to respect the time of the original test authors by not
throwing emergencies in their lap, but also strike a balance with keeping the
trees manageable.
If this plan is applied to existing tests, then it will lead to
style system mochitests being turned off due to other regressions
because I'm the person who wrote them and the module owner, and I
don't always have time to deal with regressions in other parts of
code (e.g., the JS engine) leading to these tests failing
intermittently.
If that happens, we won't have the test coverage we need to add new
CSS properties or values.
More generally, it places a much heavier burden on contributors who
have been part of the project longer, who are also likely to be
overburdened in other ways (e.g., reviews). That's why the burden
needs to be placed on the regressing change rather than the original
author of the test.
Two exceptions:
1) If a test is failing at least 50% of the time, we will file a bug and
disable the test first
These 10% and 50% numbers don't feel right to me; I think the
thresholds should probably be substantially lower. But I think it's
easier to think about these numbers in failures/day, at least for
me.
2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many
tests will need to be disabled prior to getting the tests on tbpl.
That's reasonable as long as work is done to try to get the tests
enabled (at a minimum, actually enabling all the tests that are
passing reliably, rather than stopping after enabling the passing
tests in only some directories).
One thing I want to note with my "random guy who has been fixing tests
all around the code base for a few years" hat on: I think we
collectively have been terrible at acting on intermittent failures. I
see this over and over again where someone is needinfo?ed or assigned to
an orange bug and they *never* comment or interact with the bug in any
way. That is perhaps a cultural problem which we need to address, and
I'm sure in a lot of cases is because the said person is busy with more
important things, but it has one important implication that in a lot of
the cases, the sheriffs basically have no way of knowing whether anyone
ever is gong to work on the test failure in question. I really really
hate having to disable tests, but I'm entirely convinced that if we did
not do that very aggressively, the current tree would basically be
perma-orange. And that makes it impossible for anyone to check in anything.
So please try to keep that in mind when thinking about this proposal.
Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform