On 05/11/2013 18:11, Steve Fink wrote:
These stats are *awesome*! I've been wanting them for a long time, but
never got around to generating them myself. Can we track these on an
ongoing basis?
Sure! Since we need to be working on the engineering productivity as a
whole I think this could be a good metric to see if other efforts are
paying off.
On 11/05/2013 07:09 AM, Ed Morley wrote:
On 05 November 2013 14:44:27, David Burns wrote:
We appear to be doing 1 backout for every 15 pushes on a rough
average[4].
I've been thinking about this some more - and I believe the ratio is
probably actually even worse than the numbers suggest, since:
Yeah, 1 backout for every 15 pushes sounds quite a bit better than I'd
expect.
* Depending on how the backouts are performed, the backout of several
changesets/bugs are sometimes folded into one commit.
Can this be factored into the stats? As in, parse the backout commit
messages, gather the bug numbers (or infer them from the changeset if
not given), then map them to back to the pushes for that bug? It still
won't be 100% right, but it'll be closer.
qbackout does a little bit of this when it tries to find the right
commit message to reuse when you run with --apply. But it doesn't have
access to (nor need) the pushlog, which would be required for this.
I am happy to make tweaks. The data I get is quite raw so happy to dive
in deeper to get better data.
* The 'total commits' figure includes merges & other automated/non-dev
commits.
Can this be fixed?
Sure, this should be trivial to fix.
The benefits of this approach are:
* Available local compute time scales linearly with the number of devs
hired, unlike our Tryserver automation.
That doesn't seem like a fundamental property to me. At least
theoretically, much of the tryserver automation scales with the Amazon
cloud (aka it scales with the load on some corporate credit card that
I'm glad I don't have to see the statements for).
Again theoretically, we could be buying a local build/test box for every
dev hire & active volunteer, and setting up automation that bridges the
gap between a dev's main box and the try server. (More on this below.)
There is an efficiency here that we are missing here but that is a
different discussion when there is more data.
* Local dep builds are much quicker than Try clobber builds.
Let's split that up into builds vs tests.
For the stuff I work on, building is normally not a problem. But it can
be during heavy times, because doing builds means losing push races.
With wide-ranging stuff (where the probability of failures due to
rebases is high), this means you either have to push without a final
build or get repeatedly bumped to a later day. This should get better
with the current build system improvements, so perhaps this isn't much
of a problem anymore, but I'm running into it a fair amount right now.
For tests, it depends on the test suite. But many of them just really
suck to run locally. mach magic to identify a minimal subset of tests to
run would help a lot with this, but that's going to be a substantial
amount of work. For the most part, I think the try server is the way to
go for tests. As for resource usage, my personal opinion is that if you
restrict the tests to a single platform (a "T push", which you can
generate by selecting something under "Restrict tests to platform(s)" on
http://trychooser.pub.build.mozilla.org/ ), then you're fine. I'd rather
people run tests on one try platform than whittle down the specific
tests to be run. (Well, for the first push. If you're working through a
particular issue on try, it makes sense to just test that one test suite.)
In short: use the try server. Build on everything. Test on one platform.
Run all the tests. If any fail, iterate on just the failed test suites
(unless you think your changes may break others.)
I don't have the data to prove it, but my guess is that this would
result in the lowest overall load. (Backouts are expensive! Especially
in hard-to-measure people time.)
I'm hopeful that with the build peer's ongoing overhaul of our build
system, dep build times for an average patch are going to be short
enough that there really is no excuse not to build locally. Add to
that ongoing work on improving mach commands to ease running just a
subset of the tests (for bonus points making use of the applied MQs to
guess which ones), and it really shouldn't be too onerous of a request.
Other ideas:
Would it be possible to restrict the statistics to only the active times
of day? It sucks when the tree is closed on a weekend or in the middle
of my night, but it's way way less of a problem when only a few devs are
impacted. The problem I see is tree closures when lots of people need to
land. Tree closures at other times are a different problem, and can be
addressed separately if needed. (You could even say "backouts don't
matter if there's no queue in front of any test machines", which isn't
true when you consider human cost, but it's a better approximation than
weighting a 3am Sunday PST backout the same as a middle-of-the-workday one.)
I'd also like devs to have easier access to a set of buildbot-like test
slaves. Debugging via try sucks. And the overhead in requesting access
to a slave and then figuring out how to use it is too high, so people
just don't. (This is being worked on, btw.) These boxes could double as
distributed compute servers, though that might require colocating them
with devs. Perhaps we should look into rack-mounted devs, so we could
put them directly in the data center near the build/test boxes.
Wait, ignore that last. It was a joke.
It would be even better if these test boxes could make use of my local
builds. So either copy the build over to the test box and run the tests
there, or have mach secretly synchronize the source code whenever you do
a build and do a shadow build on the test box at the same time.
Does the orange factor DB have enough granularity to identify which
tests failed for a given push? Could it, without burdening sheriffs too
much? It'd be great to have per-test statistics on the number of
failures that a test caught, so we could compare the cost of running a
test (mostly in time) to the benefit it provides, and reshuffle test
suites to run the low-cost high-reward ones all the time, and the
high-cost low-reward ones only occasionally. (You might need to tweak
this metric to reduce the estimated reward for intermittents.)
Anyway, I'll shut up now. I always have way more ideas than time to
implement them or ability to market them. There are a lot of things we
could do to improve our current setup.
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform