This looks good overall. Two questions though:
On 2014-12-18 6:47 AM, jmaher wrote:
Mozilla - 2015 Talos performance regression policy
Over the last year and a half the Talos tests have been rewritten to be more
useful and meaningful. This means we need to take them seriously and cannot
just ignore real issues when we don't have time. This does not mean we need to
fix or backout every changeset that caused a regression.
Starting in 2015, when a regression is identified to be related to a specific
changeset, the patch author will be ask for information via the needinfo flag.
We expect a response and reasonable dialog within 72 hours (3 business days) of
requesting information. If no response is given we will backout the patch(es)
in question and the patch author can investigate when they have time and reland.
Some requirements before requesting needinfo:
* On integration branches (higher volume), a talos sheriff will have verified
the root cause within 1 week of the patch landing
* a patch or set of patches from a bug must be identified as the root cause.
This can take place through retriggers on the tree or in the case of many
patches landing at once this would take place through a push to try backing out
the suspected patch(es)
* links in the bug to document the regression (and any related
regressions/improvements)
* if we are confident this is the root cause and it meets a 3% regression
threshold, then the needinfo request will mention that this policy will be
enforced
Acceptable outcomes:
* A promise to attempt a fix at the bug is agreed upon, the bug is assigned to
someone and put in a queue.
How do we ensure that the follow-up bug actually does get fixed and it
fixes the regression completely?
* The bug will contain enough details and evidence to support accepting this
regression, we will mark it as wontfix
* It is agreed that this should be backed out
Do we plan to have a different approach towards more severe regressions?
For example, if a patch regresses startup time by 50%, would we still
accept evidence to support that the regression should be accepted, or
would we tolerate it in the tree for a few weeks before it gets fixed?
Other scenarios:
* A bug related to the alert is not filed within 1 week of the patch landing.
This removes the urgency and required action.
* We only caught a regression at uplift time. There is a chance this isn't
easily determined, this will be documented and identified patch authors will
use their judgement to fix the bug
* Regression is unrelated to code (say pgo issue) - this should be documented
in the bug and closed as wontfix.
* When we uplift to Aurora or Beta, all regressions filed before the uplift
that show up on the upstream branch will have a needinfo flag set and require
action to be taken.
Please take a moment to look over this and outline any concerns you might have.
Thanks,
Joel
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform