Aryeh Gregor writes: > On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote: >> What you're saying above is true *if* someone investigates the >> intermittent test failure and determines that the bug is not >> important. But in my experience, that's not what happens at >> all. I think many people treat intermittent test failures as a >> category of unimportant problems, and therefore some bugs are >> never investigated. The fact of the matter is that most of >> these bugs are bugs in our tests, which of course will not >> impact our users directly, but I have occasionally come across >> bugs in our code code which are exposed as intermittent >> failures. The real issue is that the work of identifying where >> the root of the problem is often time is the majority of work >> needed to fix the intermittent test failure, so unless someone >> is willing to investigate the bug we cannot say whether or not >> it impacts our users. > > The same is true for many bugs. The reported symptom might > indicate a much more extensive underlying problem. The fact is, > though, thoroughly investigating every bug would take a ton of > resources, and is almost certainly not the best use of our > manpower. There are many bugs that are *known* to affect many > users that don't get fixed in a timely fashion. Things that > probably won't affect a single user ever at all, and which are > likely to be a pain to track down (because they're > intermittent), should be prioritized relatively low.
New intermittent failures are different from many user reported bugs because they are known to be a regression and there is some kind of indication of the regression window. Regressions should be high priority. People are getting by without many new features but people have begun to depend on existing features, so regressions break real sites and cause confusion for many people. The time to address regressions is ASAP, so that responsibility can be handed over to the person causing the regression. Waiting too long means that backing out the cause of the regression is likely to cause another regression. I wonder whether the real problem here is that we have too many bad tests that report false negatives, and these bad tests are reducing the value of our testsuite in general. Tests also need to be well documented so that people can understand what a negative report really means. This is probably what is leading to assumptions that disabling a test is the solution to a new failure. Getting bugs on file and seen by the right people is an important part of dealing with this. The tricky part is working out how to prioritize and cope with these bugs. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform