Re: Policy for disabling tests which run on TBPL

Aryeh Gregor Tue, 08 Apr 2014 05:16:23 -0700

On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote:
> What you're saying above is true *if* someone investigates the intermittent
> test failure and determines that the bug is not important.  But in my
> experience, that's not what happens at all.  I think many people treat
> intermittent test failures as a category of unimportant problems, and
> therefore some bugs are never investigated.  The fact of the matter is that
> most of these bugs are bugs in our tests, which of course will not impact
> our users directly, but I have occasionally come across bugs in our code
> code which are exposed as intermittent failures.  The real issue is that the
> work of identifying where the root of the problem is often time is the
> majority of work needed to fix the intermittent test failure, so unless
> someone is willing to investigate the bug we cannot say whether or not it
> impacts our users.


The same is true for many bugs.  The reported symptom might indicate a
much more extensive underlying problem.  The fact is, though,
thoroughly investigating every bug would take a ton of resources, and
is almost certainly not the best use of our manpower.  There are many
bugs that are *known* to affect many users that don't get fixed in a
timely fashion.  Things that probably won't affect a single user ever
at all, and which are likely to be a pain to track down (because
they're intermittent), should be prioritized relatively low.

> The thing that really makes me care about these intermittent failures a lot
> is that ultimately they make us have to trade either disabling a whole bunch
> of tests with being unable to manage our tree.  As more and more tests get
> disabled, we lose more and more test coverage, and that can have a much more
> severe impact on the health of our products than every individual
> intermittent test failure.

I think you hit the nail on the head, but I think there's a third
solution: automatically ignore known intermittent failures, in as
fine-grained a way as possible.  This means the test is still almost
as useful -- I think the vast majority of our tests will fail
consistently if the thing they're testing breaks, not fail
intermittently.  But it doesn't get in the way of managing the tree.
Yes, it reduces some tests' value slightly relative to fixing them,
but it's not a good use of our resources to try tracking down most
intermittent failures.  The status quo reduces those tests' value just
as much as automatic ignoring (because people will star known failure
patterns consistently), but imposes a large manual labor cost.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

Reply via email to