stockwell (intermittent failures) policy change - recommend disabling tests when 150 failures are seen in 21 days

2018-01-09 Thread jmaher
Happy new year from the stockwell team! We have been busy triaging bugs and getting the test-verify job to be fine tuned for all platforms and test suites. If you want to read about what we plan to do in the near future, you can follow along in this tracking bug: https://bugzilla.mozilla.org/sh

Re: Policy for disabling tests which run on TBPL

2014-05-10 Thread Alex Burr
Just ran across this thread. I'm not quote sure its what you're thinking of, but this may be of interest: https://github.com/Ealdwulf/bbchop It's a tool for bisection of intermittent bugs, based on Bayesian search theory. That is it, is supposed to find the intermittent bug, as opposed to fin

Re: Policy for disabling tests which run on TBPL

2014-04-18 Thread jmaher
I have made small edits thanks to Kyle and Karl, the official policy is posted on the Sheriffing wiki page: https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/l

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread Karl Tomlinson
Thank you for putting this together. It is important. jmaher writes: > This policy will define an escalation path for when a single test case is > identified to be leaking or failing and is causing enough disruption on the > trees. > Exceptions: > 1) If this test has landed (or been modified) i

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote: > On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote: > > > This policy will define an escalation path for when a single test case is > > identified to be leaking or failing and is causing enough disruption on the > > trees. Disruption is

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread Kyle Huey
On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote: > This policy will define an escalation path for when a single test case is > identified to be leaking or failing and is causing enough disruption on the > trees. Disruption is defined as: > 1) Test case is on the list of top 20 intermittent failure

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
I want to express my thanks to everyone who contributed to this thread. We have a lot of passionate and smart people who care about this topic- thanks again for weighing in so far. Below is a slightly updated policy from the original, and following that is an attempt to summarize the thread an

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Ehsan Akhgari
On 2014-04-09, 6:46 PM, Chris Peterson wrote: On 4/9/14, 11:48 AM, Gregory Szorc wrote: I feel a lot of people just shrug shoulders and allow the test to be disabled (I'm guilty of it as much as anyone). From my perspective, it's difficult to convince the powers at be that fixing intermittent fa

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Chris Peterson
On 4/9/14, 11:48 AM, Gregory Szorc wrote: I feel a lot of people just shrug shoulders and allow the test to be disabled (I'm guilty of it as much as anyone). From my perspective, it's difficult to convince the powers at be that fixing intermittent failures (that have been successfully swept under

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
On 4/9/14, 2:07 PM, Karl Tomlinson wrote: Gregory Szorc writes: 2) Run marked intermittent tests multiple times. If it works all 25 times, fail the test run for inconsistent metadata. We need to consider intermittently failing tests as failed, and we need to only test things that always pass.

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Karl Tomlinson
Gregory Szorc writes: > 2) Run marked intermittent tests multiple times. If it works all > 25 times, fail the test run for inconsistent metadata. We need to consider intermittently failing tests as failed, and we need to only test things that always pass. We can't rely on statistics to tell us a

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
s well. Absolutely not! I am very disappointed with the current dynamic between sheriffs and module owners and test authors because disabling tests is leading to worse test coverage and opening ourselves up to all kinds of risks. I'd like to think test authors and module owners should ha

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread L. David Baron
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote: > The simple solution is to have a separate in-tree manifest > annotation for intermittents. Put another way, we can describe > exactly why we are not running a test. This is kinda/sorta the realm > of bug 922581. > > The harder solution is

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
nge Factor better. To address David Baron's concern about silently passing intermittently failing tests, yes, silently passing is wrong. But I would argue it is the lesser evil of disabling tests outright. I think we can all agree that the current approach of disabling failing tests (the equiva

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Ehsan Akhgari
On 2014-04-08, 6:10 PM, Karl Tomlinson wrote: I wonder whether the real problem here is that we have too many bad tests that report false negatives, and these bad tests are reducing the value of our testsuite in general. Tests also need to be well documented so that people can understand what a

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Karl Tomlinson
Aryeh Gregor writes: > On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: >> What you're saying above is true *if* someone investigates the >> intermittent test failure and determines that the bug is not >> important. But in my experience, that's not what happens at >> all. I think many peopl

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 3:15 PM, Chris Peterson wrote: On 4/8/14, 11:41 AM, Gavin Sharp wrote: Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Chris Peterson
On 4/8/14, 11:41 AM, Gavin Sharp wrote: Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But I see that a separate discussion not really rel

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread L. David Baron
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote: > I see only two real goals for the proposed policy: > - ensure that module owners/peers have the opportunity to object to > any "disable test" decisions before they take effect > - set an expectation that intermittent orange failures are dealt

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Gavin Sharp
t coverage when they decide to unilaterally disable the relevant tests. Sheriffs should not be disabling tests unilaterally; developers should not be ignoring sheriff requests to investigate failures. The policy is not intended to suggest that any particular outcome (i.e. test disabling) is required.

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread L. David Baron
On Tuesday 2014-04-08 14:51 +0100, James Graham wrote: > So, what's the minimum level of infrastructure that you think would > be needed to go ahead with this plan? To me it seems like the > current system already isn't working very well, so the bar for > moving forward with a plan that would incre

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham
On 08/04/14 15:06, Ehsan Akhgari wrote: On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test l

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote: On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I think

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in th

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham
On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Andrew Halberstadt
On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Aryeh Gregor
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: > What you're saying above is true *if* someone investigates the intermittent > test failure and determines that the bug is not important. But in my > experience, that's not what happens at all. I think many people treat > intermittent test fa

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ehsan Akhgari
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Mike Hoye
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote: It's difficult to say whether bugs we find via tests are more or less important than bugs we find via users. It's entirely possible that lots of the bugs that cause intermittent test failures cause intermittent weird behavior for our users, we simp

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: > If a bug is causing a test to fail intermittently, then that test loses > value. It still has some value in that it can catch regressions that > cause it to fail permanently, but we would not be able to catch a > regression that causes it to

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ted Mielczarek
On 4/7/2014 9:02 AM, Aryeh Gregor wrote: > On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt > wrote: >> I would guess the former is true in most cases. But at least there we have a >> *chance* at tracking down and fixing the failure, even if it takes awhile >> before it becomes annoying enough t

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt wrote: > I would guess the former is true in most cases. But at least there we have a > *chance* at tracking down and fixing the failure, even if it takes awhile > before it becomes annoying enough to prioritize. If we made it so > intermittents n

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Andrew Halberstadt
On 07/04/14 05:10 AM, James Graham wrote: On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass i

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt wrote: > Many of our test runners have that ability. But doing this implies that > intermittents are always the fault of the test. We'd be missing whole > classes of regressions (notably race conditions). We already are, because we already will s

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread James Graham
On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of cou

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Andrew Halberstadt
On 04/04/14 03:44 PM, Ehsan Akhgari wrote: On 2014-04-04, 3:12 PM, L. David Baron wrote: Are you talking about newly-added tests, or tests that have been passing for a long time and recently started failing? In the latter case, the burden should fall on the regressing patch, and the regressing

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Andrew Halberstadt
On 06/04/14 08:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're resource/time intensive (basically

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Karl Tomlinson
On Fri, 4 Apr 2014 11:58:28 -0700 (PDT), jmaher wrote: > Two exceptions: > 2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many > tests will need to be disabled prior to getting the tests on tbpl. It makes sense to disable some tests so that others can run. I assume bugs

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Karl Tomlinson
On Fri, 4 Apr 2014 12:49:45 -0700 (PDT), jmaher wrote: >> overburdened in other ways (e.g., reviews). the burden >> needs to be placed on the regressing change rather than the original >> author of the test. > > I am open to ideas to help figure out the offending changes. My > understanding is m

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Ed Morley
On 06 April 2014 14:58:24, Ehsan Akhgari wrote: On 2014-04-06, 8:59 AM, Aryeh Gregor wrote: Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass if it passes a few times in a row after

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Ehsan Akhgari
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're resource/time intensive (basical

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Aryeh Gregor
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote: > Note that is only accurate to a certain point. There are other things which > we can do to guesswork our way out of the situation for Autoland, but of > course they're resource/time intensive (basically running orange tests over > and over a

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread L. David Baron
On Friday 2014-04-04 12:49 -0700, jmaher wrote: > > If this plan is applied to existing tests, then it will lead to > > style system mochitests being turned off due to other regressions > > because I'm the person who wrote them and the module owner, and I > > don't always have time to deal with reg

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Chris Peterson
On 4/4/14, 2:21 PM, Martin Thomson wrote: On 2014-04-04, at 14:02, Ehsan Akhgari wrote: That's not true, we were in that state once, before I stopped working on this issue. We can get there again if we wanted to. It's just a lot of hard work which won't scale if we only have one person doi

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Martin Thomson
On 2014-04-04, at 14:02, Ehsan Akhgari wrote: > That's not true, we were in that state once, before I stopped working on this > issue. We can get there again if we wanted to. It's just a lot of hard work > which won't scale if we only have one person doing it. It’s self-correcting too. Turn

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 4:58 PM, Jonathan Griffin wrote: With respect to Autoland, I think we'll need to figure out how to make it take intermittents into account. I don't think we'll ever be a state with 0 intermittents. That's not true, we were in that state once, before I stopped working on this is

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 4:30 PM, Chris Peterson wrote: On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time identifying the regressing patch is difficult Identifying the regressing patch is only difficult because we have so many intermittently failing tests. Intermittent oranges are one of

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Jonathan Griffin
With respect to Autoland, I think we'll need to figure out how to make it take intermittents into account. I don't think we'll ever be a state with 0 intermittents. Jonathan On 4/4/2014 1:30 PM, Chris Peterson wrote: On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time identifyin

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Chris Peterson
On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time identifying the regressing patch is difficult Identifying the regressing patch is only difficult because we have so many intermittently failing tests. Intermittent oranges are one of the major blockers for Autoland. If TBPL nev

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Gavin Sharp
On Fri, Apr 4, 2014 at 12:12 PM, L. David Baron wrote: >> Escalation path: >> 1) Ensure we have a bug on file, with the test author, reviewer, module >> owner, and any other interested parties, links to logs, etc. >> 2) We need to needinfo? and expect a response within 2 business days, this >> s

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
> > > 4) In the case we go another 2 days with no response from a module owner, > > we will disable the test. > > > > Are you talking about newly-added tests, or tests that have been > > passing for a long time and recently started failing? > > > > In the latter case, the burden should fal

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 3:12 PM, L. David Baron wrote: On Friday 2014-04-04 11:58 -0700, jmaher wrote: As the sheriff's know it is frustrating to deal with hundreds of tests that fail on a daily basis, but are intermittent. When a single test case is identified to be leaking or failing at least 10% of

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread L. David Baron
On Friday 2014-04-04 11:58 -0700, jmaher wrote: > As the sheriff's know it is frustrating to deal with hundreds of tests that > fail on a daily basis, but are intermittent. > > When a single test case is identified to be leaking or failing at least 10% > of the time, it is time to escalate. > >

Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
As the sheriff's know it is frustrating to deal with hundreds of tests that fail on a daily basis, but are intermittent. When a single test case is identified to be leaking or failing at least 10% of the time, it is time to escalate. Escalation path: 1) Ensure we have a bug on file, with the te

Re: Disabling tests

2012-08-02 Thread Ed Morley
On Friday, 3 August 2012 02:30:03 UTC+1, Philip Chee wrote: > If it's random, how do you know if you've actually fixed it without > having to waste your time watching the tree for a week? http://brasstacks.mozilla.com/orangefactor/ ___ dev-platform mail

Re: Disabling tests

2012-08-02 Thread Philip Chee
On Thu, 02 Aug 2012 10:22:43 -0500, Scott Johnson wrote: > Maybe we should take a different approach to the problem... > > Have an all-developer "fix orange tests" day where developers work ONLY > on fixing random oranges (maybe once you've successfully fixed 1 or 2 > random oranges, you can go

Re: Disabling tests

2012-08-02 Thread Scott Johnson
Maybe we should take a different approach to the problem... Have an all-developer "fix orange tests" day where developers work ONLY on fixing random oranges (maybe once you've successfully fixed 1 or 2 random oranges, you can go back to your other work?) Not meant as a criticism, but rather jus

Re: Disabling tests

2012-07-25 Thread Dao
On 25.07.2012 02:05, ben turner wrote: Disabling a test without a peer's input and then leaving open an unassigned bug to re-enable it is a pretty good way to leave the test disabled forever. Seems like somebody should be watching the component and take care of the bug. If this doesn't happen,

Re: Disabling tests

2012-07-25 Thread simetrical
n of course module peers can re-enable it -- after fixing the random orange. But no, it doesn't seem to me that peers should be able to decide that their tests are important enough that everyone else has to live with random orange because of them. I think we'd benefit from better way

Re: Disabling tests

2012-07-25 Thread Ms2ger
On 07/25/2012 02:05 AM, ben turner wrote: Peers can make the decision if it's worth disabling a test, pull someone off of other tasks to fix it, or whatever. Or in this case you claim that wasting sheriffs' time is much less important than your vacation / other tasks, as you did in the bug.

Re: Disabling tests

2012-07-24 Thread Ted Mielczarek
On Tue, Jul 24, 2012 at 8:05 PM, ben turner wrote: > Any thoughts or objections? This sums up my feelings on the matter pretty well. If we disable tests we lose test coverage. Sometimes flaky tests just means we have flaky tests, but sometimes it means we're shipping flaky features, which is the

Disabling tests

2012-07-24 Thread ben turner
Hi folks, Recently we've started writing some platform tests that exercise content processes for B2G, and as you can probably imagine those tests are somewhat fragile. I think it's safe to say that no one likes randomly failing tests. Our extremely overworked and underpaid sheriffs and volunteers