On 15:10, Tue, 05 Nov, James Graham wrote:
On 05/11/13 14:57, Kyle Huey wrote:
On Tue, Nov 5, 2013 at 10:44 PM, David Burns <dbu...@mozilla.com> wrote:

We appear to be doing 1 backout for every 15 pushes on a rough average[4].
This number I am sure you can all agree is far too high especially if we
think about the figures that John O'Duinn suggests[5] for the cost of each
push for running and testing. With the offending patch + backout we are
using 508 computing hours for essentially doing no changes to the tree and
then we do another 254 computing hours for the fixed reland. Note the that
the 508 hours doesn't include retriggers done by the Sheriffs to see if it
is intermittent or not.

This is a lot of wasted effort when we should be striving to get patches
to stick first time. Let's see if we can try make this figure 1 in 30
patches getting backed out.


What is your proposal for doing that?  What are the costs involved?  It
isn't very useful to say X is bad, let's not do X, without looking at what
it costs to not do X.

To give one hypothetical example, if it requires just two additional full
try pushes to avoid one backout, we haven't actually saved any computing
time.

So, as far as I can tell that the heart of the problem is that the end-to-end time for the build+test infrastructure is unworkably slow. I understand that waiting half a dozen hours — a significant fraction of a work day — for a try run is considered normal. This has a huge knock-on effect e.g. it requires people to context switch away from one problem whilst they wait, and context switch back into it once they have the results. Presumably it also encourages landing changes without proper testing, which increases the backout rate. It seems that this will cost a great deal not just in terms of compute hours (which are easy to measure) but also in terms of developer productivity (which is harder to measure, but could be even more significant).

Wht data do we currently have about why the wait time is so long? If this data doesn't exist, can we start to collect it? Are there easy wins to be had, or do we need to think about restructuring the way that we do builds and/or testing to achieve greater throughput?

We're publishing data in several places about total run time for jobs.

For overall build metrics, you can try http://brasstacks.mozilla.com/gofaster/

For specific revisions you can query self-serve, e.g.
https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/5ff9d60c6803, or in json
https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/5ff9d60c6803?format=json

For historical data, you can look at all our archived build data here: http://builddata.pub.build.mozilla.org/buildjson/

Average times for builds/tests on m-c are published here:
https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/branch_times/output.txt

"end-to-end" times for try are here:
https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/end2end_try/end2end.html

I hope this helps!

Cheers,
Chris

Attachment: signature.asc
Description: Digital signature

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to