Re: New policy: 48-hour backouts for major Talos regressions

Chris Pearce Tue, 18 Aug 2015 15:43:49 -0700

We recently had a false positive Talos regression on our team, whichturned out to be caused by a change to the test machine coinciding withour push. This took up a bunch of energy and time away from our team,which we really can't afford.

So to mitigate that I propose that *before* the backout happens, someoneon the regression-detection team does an `hg up` and Try push of thebackout and a Try push without the backout to ensure that backing outactually helps.

Retriggers on the original push in our case didn't help, so I think acompletely clean push is necessary.

This should also assist the regression-detection team in convincingpatch authors that their patch is at fault.

The Try-backout should happen before the need-info to the patch authorhappens. If the backout has non-trivial merge conflicts, then the firstaction of the patch author should be to preform this step instead of theregression-detection team member.


cpearce.


On 8/15/2015 1:02 PM, Vladan Djeric wrote:

There are known issues with the test infrastructure (e.g. differences in
weekend vs weekday results) and those known issues are currently being
masked with human judgement.
A-Team has investigated these issues, and fixed some of them, but fixing
the rest will take a non-trivial amount of effort as I understand it.
When there's enough time to fix all the sources of noise in the
infrastructure, human judgement will no longer be required.

As an aside, I'm answering the questions for this 48-hour backout
announcement, but it's really Joel Maher + William Lachance + Vaibhav
Agarwal doing all the heavy lifting related to regression handling. They're
working on the regression-detection and regression-investigation tools, and
they're the ones acting as perf sheriffs.

Avi from my team is helping test the tools, and I just participate in
policy discussions and act as an (unintentional) spokesperson :)

On Fri, Aug 14, 2015 at 8:49 PM, Martin Thomson <m...@mozilla.com> wrote:

On Fri, Aug 14, 2015 at 3:44 PM, Vladan Djeric <vdje...@mozilla.com>
wrote:

Is this the ts_paint regression you're referring to?

https://groups.google.com/forum/#!searchin/mozilla.dev.tree-alerts/ts_paint/mozilla.dev.tree-alerts/FArVsa8guXg/FfY91JK7AAAJ

Yeah.  I only ask because in exercising judgment suppresses
information about the stability of the tests, so that all we have is
anectodal evidence.  That's probably OK here.  The process you
describe sounds pretty robust against false positives.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New policy: 48-hour backouts for major Talos regressions

Reply via email to