Re: Pushes to Backouts on Mozilla Inbound

Steve Fink Tue, 05 Nov 2013 10:11:51 -0800

These stats are *awesome*! I've been wanting them for a long time, but
never got around to generating them myself. Can we track these on an
ongoing basis?

On 11/05/2013 07:09 AM, Ed Morley wrote:
> On 05 November 2013 14:44:27, David Burns wrote:
>> We appear to be doing 1 backout for every 15 pushes on a rough
>> average[4].
>
> I've been thinking about this some more - and I believe the ratio is
> probably actually even worse than the numbers suggest, since:

Yeah, 1 backout for every 15 pushes sounds quite a bit better than I'd
expect.

> * Depending on how the backouts are performed, the backout of several
> changesets/bugs are sometimes folded into one commit.

Can this be factored into the stats? As in, parse the backout commit
messages, gather the bug numbers (or infer them from the changeset if
not given), then map them to back to the pushes for that bug? It still
won't be 100% right, but it'll be closer.

qbackout does a little bit of this when it tries to find the right
commit message to reuse when you run with --apply. But it doesn't have
access to (nor need) the pushlog, which would be required for this.

> * The 'total commits' figure includes merges & other automated/non-dev
> commits.

Can this be fixed?

> * Sometimes breakage is fixed in-place with a commit message such as
> "Bug 123456 - Followup...", which was still for a landing that broke
> the tree, but wouldn't count be counted.

Could this be inferred from the starring comments? I guess it looks like
the stats dburns posted don't involve the starring comments (yet?). I
guess the rule would be that a changeset is a bustage fix if it or the
tip changeset in its push appears in a star comment.

Can we please structure the starring comments yet? I started a little
bit down this path a while back (providing buttons on tbpl for common
starring reasons), but then stalled it off to wait for tbpl2. It' would
be really really good if the sheriffs could feed in metadata saying
whether something is a backout or intermittent or whatever. I feel like
we have knowledge in the heads of the human parts of our system that's
getting dropped on the floor. We could be making much better use of it,
since they're already figuring these things out anyway. It's not just
for computing goodness metrics, either; it could make it much easier to
implement autolanding (aka landing queues), probabilistic coalescing
(what useless jobs can you skip to make way for important ones), and
other goodies.

>
>
> On 05 November 2013 14:57:17, Kyle Huey wrote:
>> What is your proposal for doing that?  What are the costs involved?
>
> For one: devs building/testing locally before pushing. Many cases of
> failures would have been caught be just a simple single-platform
> build+run of a single directory's worth of tests.

If you know the right directory, sure. Though even then, local tests can
be very disruptive to run.

>
> The benefits of this approach are:
> * Available local compute time scales linearly with the number of devs
> hired, unlike our Tryserver automation.

That doesn't seem like a fundamental property to me. At least
theoretically, much of the tryserver automation scales with the Amazon
cloud (aka it scales with the load on some corporate credit card that
I'm glad I don't have to see the statements for).

Again theoretically, we could be buying a local build/test box for every
dev hire & active volunteer, and setting up automation that bridges the
gap between a dev's main box and the try server. (More on this below.)

> * Local dep builds are much quicker than Try clobber builds.

Let's split that up into builds vs tests.

For the stuff I work on, building is normally not a problem. But it can
be during heavy times, because doing builds means losing push races.
With wide-ranging stuff (where the probability of failures due to
rebases is high), this means you either have to push without a final
build or get repeatedly bumped to a later day. This should get better
with the current build system improvements, so perhaps this isn't much
of a problem anymore, but I'm running into it a fair amount right now.

For tests, it depends on the test suite. But many of them just really
suck to run locally. mach magic to identify a minimal subset of tests to
run would help a lot with this, but that's going to be a substantial
amount of work. For the most part, I think the try server is the way to
go for tests. As for resource usage, my personal opinion is that if you
restrict the tests to a single platform (a "T push", which you can
generate by selecting something under "Restrict tests to platform(s)" on
http://trychooser.pub.build.mozilla.org/ ), then you're fine. I'd rather
people run tests on one try platform than whittle down the specific
tests to be run. (Well, for the first push. If you're working through a
particular issue on try, it makes sense to just test that one test suite.)

In short: use the try server. Build on everything. Test on one platform.
Run all the tests. If any fail, iterate on just the failed test suites
(unless you think your changes may break others.)

I don't have the data to prove it, but my guess is that this would
result in the lowest overall load. (Backouts are expensive! Especially
in hard-to-measure people time.)

>
> I'm hopeful that with the build peer's ongoing overhaul of our build
> system, dep build times for an average patch are going to be short
> enough that there really is no excuse not to build locally. Add to
> that ongoing work on improving mach commands to ease running just a
> subset of the tests (for bonus points making use of the applied MQs to
> guess which ones), and it really shouldn't be too onerous of a request.

Other ideas:

Would it be possible to restrict the statistics to only the active times
of day? It sucks when the tree is closed on a weekend or in the middle
of my night, but it's way way less of a problem when only a few devs are
impacted. The problem I see is tree closures when lots of people need to
land. Tree closures at other times are a different problem, and can be
addressed separately if needed. (You could even say "backouts don't
matter if there's no queue in front of any test machines", which isn't
true when you consider human cost, but it's a better approximation than
weighting a 3am Sunday PST backout the same as a middle-of-the-workday one.)

I'd also like devs to have easier access to a set of buildbot-like test
slaves. Debugging via try sucks. And the overhead in requesting access
to a slave and then figuring out how to use it is too high, so people
just don't. (This is being worked on, btw.) These boxes could double as
distributed compute servers, though that might require colocating them
with devs. Perhaps we should look into rack-mounted devs, so we could
put them directly in the data center near the build/test boxes.

Wait, ignore that last. It was a joke.

It would be even better if these test boxes could make use of my local
builds. So either copy the build over to the test box and run the tests
there, or have mach secretly synchronize the source code whenever you do
a build and do a shadow build on the test box at the same time.

Does the orange factor DB have enough granularity to identify which
tests failed for a given push? Could it, without burdening sheriffs too
much? It'd be great to have per-test statistics on the number of
failures that a test caught, so we could compare the cost of running a
test (mostly in time) to the benefit it provides, and reshuffle test
suites to run the low-cost high-reward ones all the time, and the
high-cost low-reward ones only occasionally. (You might need to tweak
this metric to reduce the estimated reward for intermittents.)

Anyway, I'll shut up now. I always have way more ideas than time to
implement them or ability to market them. There are a lot of things we
could do to improve our current setup.

_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Pushes to Backouts on Mozilla Inbound

Reply via email to