Aryeh Gregor writes:

> On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote:
>> What you're saying above is true *if* someone investigates the
>> intermittent test failure and determines that the bug is not
>> important.  But in my experience, that's not what happens at
>> all.  I think many people treat intermittent test failures as a
>> category of unimportant problems, and therefore some bugs are
>> never investigated.  The fact of the matter is that most of
>> these bugs are bugs in our tests, which of course will not
>> impact our users directly, but I have occasionally come across
>> bugs in our code code which are exposed as intermittent
>> failures.  The real issue is that the work of identifying where
>> the root of the problem is often time is the majority of work
>> needed to fix the intermittent test failure, so unless someone
>> is willing to investigate the bug we cannot say whether or not
>> it impacts our users.
>
> The same is true for many bugs.  The reported symptom might
> indicate a much more extensive underlying problem.  The fact is,
> though, thoroughly investigating every bug would take a ton of
> resources, and is almost certainly not the best use of our
> manpower.  There are many bugs that are *known* to affect many
> users that don't get fixed in a timely fashion.  Things that
> probably won't affect a single user ever at all, and which are
> likely to be a pain to track down (because they're
> intermittent), should be prioritized relatively low.

New intermittent failures are different from many user reported
bugs because they are known to be a regression and there is some
kind of indication of the regression window.

Regressions should be high priority.  People are getting by
without many new features but people have begun to depend on
existing features, so regressions break real sites and cause
confusion for many people.

The time to address regressions is ASAP, so that responsibility
can be handed over to the person causing the regression.  Waiting
too long means that backing out the cause of the regression is
likely to cause another regression.

I wonder whether the real problem here is that we have too many
bad tests that report false negatives, and these bad tests are
reducing the value of our testsuite in general.  Tests also need
to be well documented so that people can understand what a
negative report really means.  This is probably what is leading to
assumptions that disabling a test is the solution to a new
failure.

Getting bugs on file and seen by the right people is an important
part of dealing with this.  The tricky part is working out how to
prioritize and cope with these bugs.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to