On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote: > What you're saying above is true *if* someone investigates the intermittent > test failure and determines that the bug is not important. But in my > experience, that's not what happens at all. I think many people treat > intermittent test failures as a category of unimportant problems, and > therefore some bugs are never investigated. The fact of the matter is that > most of these bugs are bugs in our tests, which of course will not impact > our users directly, but I have occasionally come across bugs in our code > code which are exposed as intermittent failures. The real issue is that the > work of identifying where the root of the problem is often time is the > majority of work needed to fix the intermittent test failure, so unless > someone is willing to investigate the bug we cannot say whether or not it > impacts our users.
The same is true for many bugs. The reported symptom might indicate a much more extensive underlying problem. The fact is, though, thoroughly investigating every bug would take a ton of resources, and is almost certainly not the best use of our manpower. There are many bugs that are *known* to affect many users that don't get fixed in a timely fashion. Things that probably won't affect a single user ever at all, and which are likely to be a pain to track down (because they're intermittent), should be prioritized relatively low. > The thing that really makes me care about these intermittent failures a lot > is that ultimately they make us have to trade either disabling a whole bunch > of tests with being unable to manage our tree. As more and more tests get > disabled, we lose more and more test coverage, and that can have a much more > severe impact on the health of our products than every individual > intermittent test failure. I think you hit the nail on the head, but I think there's a third solution: automatically ignore known intermittent failures, in as fine-grained a way as possible. This means the test is still almost as useful -- I think the vast majority of our tests will fail consistently if the thing they're testing breaks, not fail intermittently. But it doesn't get in the way of managing the tree. Yes, it reduces some tests' value slightly relative to fixing them, but it's not a good use of our resources to try tracking down most intermittent failures. The status quo reduces those tests' value just as much as automatic ignoring (because people will star known failure patterns consistently), but imposes a large manual labor cost. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform