Happy new year from the stockwell team! We have been busy triaging bugs and
getting the test-verify job to be fine tuned for all platforms and test suites.
If you want to read about what we plan to do in the near future, you can follow
along in this tracking bug:
https://bugzilla.mozilla.org/sh
Just ran across this thread. I'm not quote sure its what you're thinking of,
but this may be of interest:
https://github.com/Ealdwulf/bbchop
It's a tool for bisection of intermittent bugs, based on Bayesian search
theory. That is it, is supposed to find the intermittent bug, as opposed to
fin
I have made small edits thanks to Kyle and Karl, the official policy is posted
on the Sheriffing wiki page:
https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/l
Thank you for putting this together. It is important.
jmaher writes:
> This policy will define an escalation path for when a single test case is
> identified to be leaking or failing and is causing enough disruption on the
> trees.
> Exceptions:
> 1) If this test has landed (or been modified) i
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote:
> On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote:
>
> > This policy will define an escalation path for when a single test case is
> > identified to be leaking or failing and is causing enough disruption on the
> > trees. Disruption is
On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote:
> This policy will define an escalation path for when a single test case is
> identified to be leaking or failing and is causing enough disruption on the
> trees. Disruption is defined as:
> 1) Test case is on the list of top 20 intermittent failure
I want to express my thanks to everyone who contributed to this thread. We
have a lot of passionate and smart people who care about this topic- thanks
again for weighing in so far.
Below is a slightly updated policy from the original, and following that is an
attempt to summarize the thread an
On 2014-04-09, 6:46 PM, Chris Peterson wrote:
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent fa
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent failures
(that have been successfully swept under
On 4/9/14, 2:07 PM, Karl Tomlinson wrote:
Gregory Szorc writes:
2) Run marked intermittent tests multiple times. If it works all
25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always pass.
Gregory Szorc writes:
> 2) Run marked intermittent tests multiple times. If it works all
> 25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always pass.
We can't rely on statistics to tell us a
s well.
Absolutely not! I am very disappointed with the current dynamic between
sheriffs and module owners and test authors because disabling tests is
leading to worse test coverage and opening ourselves up to all kinds of
risks. I'd like to think test authors and module owners should ha
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote:
> The simple solution is to have a separate in-tree manifest
> annotation for intermittents. Put another way, we can describe
> exactly why we are not running a test. This is kinda/sorta the realm
> of bug 922581.
>
> The harder solution is
nge Factor better.
To address David Baron's concern about silently passing intermittently
failing tests, yes, silently passing is wrong. But I would argue it is
the lesser evil of disabling tests outright.
I think we can all agree that the current approach of disabling failing
tests (the equiva
On 2014-04-08, 6:10 PM, Karl Tomlinson wrote:
I wonder whether the real problem here is that we have too many
bad tests that report false negatives, and these bad tests are
reducing the value of our testsuite in general. Tests also need
to be well documented so that people can understand what a
Aryeh Gregor writes:
> On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
>> What you're saying above is true *if* someone investigates the
>> intermittent test failure and determines that the bug is not
>> important. But in my experience, that's not what happens at
>> all. I think many peopl
On 2014-04-08, 3:15 PM, Chris Peterson wrote:
On 4/8/14, 11:41 AM, Gavin Sharp wrote:
Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But
On 4/8/14, 11:41 AM, Gavin Sharp wrote:
Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really rel
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote:
> I see only two real goals for the proposed policy:
> - ensure that module owners/peers have the opportunity to object to
> any "disable test" decisions before they take effect
> - set an expectation that intermittent orange failures are dealt
t coverage when they decide to
unilaterally disable the relevant tests. Sheriffs should not be
disabling tests unilaterally; developers should not be ignoring
sheriff requests to investigate failures.
The policy is not intended to suggest that any particular outcome
(i.e. test disabling) is required.
On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
> So, what's the minimum level of infrastructure that you think would
> be needed to go ahead with this plan? To me it seems like the
> current system already isn't working very well, so the bar for
> moving forward with a plan that would incre
On 08/04/14 15:06, Ehsan Akhgari wrote:
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test
l
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote:
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
What you're saying above is true *if* someone investigates the intermittent
test failure and determines that the bug is not important. But in my
experience, that's not what happens at all. I think
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in th
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
> What you're saying above is true *if* someone investigates the intermittent
> test failure and determines that the bug is not important. But in my
> experience, that's not what happens at all. I think many people treat
> intermittent test fa
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote:
It's difficult to say whether bugs we find via tests are more or less
important than bugs we find via users. It's entirely possible that
lots of the bugs that cause intermittent test failures cause
intermittent weird behavior for our users, we simp
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
> If a bug is causing a test to fail intermittently, then that test loses
> value. It still has some value in that it can catch regressions that
> cause it to fail permanently, but we would not be able to catch a
> regression that causes it to
On 4/7/2014 9:02 AM, Aryeh Gregor wrote:
> On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
> wrote:
>> I would guess the former is true in most cases. But at least there we have a
>> *chance* at tracking down and fixing the failure, even if it takes awhile
>> before it becomes annoying enough t
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
wrote:
> I would guess the former is true in most cases. But at least there we have a
> *chance* at tracking down and fixing the failure, even if it takes awhile
> before it becomes annoying enough to prioritize. If we made it so
> intermittents n
On 07/04/14 05:10 AM, James Graham wrote:
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass i
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt
wrote:
> Many of our test runners have that ability. But doing this implies that
> intermittents are always the fault of the test. We'd be missing whole
> classes of regressions (notably race conditions).
We already are, because we already will s
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari
wrote:
Note that is only accurate to a certain point. There are other
things which
we can do to guesswork our way out of the situation for Autoland, but of
cou
On 04/04/14 03:44 PM, Ehsan Akhgari wrote:
On 2014-04-04, 3:12 PM, L. David Baron wrote:
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the regressing patch,
and the regressing
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basically
On Fri, 4 Apr 2014 11:58:28 -0700 (PDT), jmaher wrote:
> Two exceptions:
> 2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many
> tests will need to be disabled prior to getting the tests on tbpl.
It makes sense to disable some tests so that others can run.
I assume bugs
On Fri, 4 Apr 2014 12:49:45 -0700 (PDT), jmaher wrote:
>> overburdened in other ways (e.g., reviews). the burden
>> needs to be placed on the regressing change rather than the original
>> author of the test.
>
> I am open to ideas to help figure out the offending changes. My
> understanding is m
On 06 April 2014 14:58:24, Ehsan Akhgari wrote:
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row after
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basical
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
> Note that is only accurate to a certain point. There are other things which
> we can do to guesswork our way out of the situation for Autoland, but of
> course they're resource/time intensive (basically running orange tests over
> and over a
On Friday 2014-04-04 12:49 -0700, jmaher wrote:
> > If this plan is applied to existing tests, then it will lead to
> > style system mochitests being turned off due to other regressions
> > because I'm the person who wrote them and the module owner, and I
> > don't always have time to deal with reg
On 4/4/14, 2:21 PM, Martin Thomson wrote:
On 2014-04-04, at 14:02, Ehsan Akhgari wrote:
That's not true, we were in that state once, before I stopped working on this
issue. We can get there again if we wanted to. It's just a lot of hard work
which won't scale if we only have one person doi
On 2014-04-04, at 14:02, Ehsan Akhgari wrote:
> That's not true, we were in that state once, before I stopped working on this
> issue. We can get there again if we wanted to. It's just a lot of hard work
> which won't scale if we only have one person doing it.
It’s self-correcting too. Turn
On 2014-04-04, 4:58 PM, Jonathan Griffin wrote:
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
That's not true, we were in that state once, before I stopped working on
this is
On 2014-04-04, 4:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
Jonathan
On 4/4/2014 1:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifyin
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of the major blockers for Autoland. If TBPL
nev
On Fri, Apr 4, 2014 at 12:12 PM, L. David Baron wrote:
>> Escalation path:
>> 1) Ensure we have a bug on file, with the test author, reviewer, module
>> owner, and any other interested parties, links to logs, etc.
>> 2) We need to needinfo? and expect a response within 2 business days, this
>> s
>
> > 4) In the case we go another 2 days with no response from a module owner,
> > we will disable the test.
>
>
>
> Are you talking about newly-added tests, or tests that have been
>
> passing for a long time and recently started failing?
>
>
>
> In the latter case, the burden should fal
On 2014-04-04, 3:12 PM, L. David Baron wrote:
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
> As the sheriff's know it is frustrating to deal with hundreds of tests that
> fail on a daily basis, but are intermittent.
>
> When a single test case is identified to be leaking or failing at least 10%
> of the time, it is time to escalate.
>
>
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
the time, it is time to escalate.
Escalation path:
1) Ensure we have a bug on file, with the te
On Friday, 3 August 2012 02:30:03 UTC+1, Philip Chee wrote:
> If it's random, how do you know if you've actually fixed it without
> having to waste your time watching the tree for a week?
http://brasstacks.mozilla.com/orangefactor/
___
dev-platform mail
On Thu, 02 Aug 2012 10:22:43 -0500, Scott Johnson wrote:
> Maybe we should take a different approach to the problem...
>
> Have an all-developer "fix orange tests" day where developers work ONLY
> on fixing random oranges (maybe once you've successfully fixed 1 or 2
> random oranges, you can go
Maybe we should take a different approach to the problem...
Have an all-developer "fix orange tests" day where developers work ONLY
on fixing random oranges (maybe once you've successfully fixed 1 or 2
random oranges, you can go back to your other work?)
Not meant as a criticism, but rather jus
On 25.07.2012 02:05, ben turner wrote:
Disabling a test without a peer's input and then leaving open an
unassigned bug to re-enable it is a pretty good way to leave the test
disabled forever.
Seems like somebody should be watching the component and take care of
the bug. If this doesn't happen,
n of course
module peers can re-enable it -- after fixing the random orange. But no, it
doesn't seem to me that peers should be able to decide that their tests are
important enough that everyone else has to live with random orange because of
them.
I think we'd benefit from better way
On 07/25/2012 02:05 AM, ben turner wrote:
Peers can
make the decision if it's worth disabling a test, pull someone off of
other tasks to fix it, or whatever.
Or in this case you claim that wasting sheriffs' time is much less
important than your vacation / other tasks, as you did in the bug.
On Tue, Jul 24, 2012 at 8:05 PM, ben turner wrote:
> Any thoughts or objections?
This sums up my feelings on the matter pretty well. If we disable
tests we lose test coverage. Sometimes flaky tests just means we have
flaky tests, but sometimes it means we're shipping flaky features,
which is the
Hi folks,
Recently we've started writing some platform tests that exercise
content processes for B2G, and as you can probably imagine those tests
are somewhat fragile.
I think it's safe to say that no one likes randomly failing tests. Our
extremely overworked and underpaid sheriffs and volunteers
62 matches
Mail list logo