Re: Proposal for using a multi-headed tree instead of inbound

Kartikaya Gupta Wed, 03 Apr 2013 22:52:04 -0700

Thanks for the many comments! I replied to a bunch of stuff below.


On 13-04-03 18:34 , Justin Lebar wrote:

This is a really interesting idea.

git is smart enough to let you pull only the csets that end up getting
merged into master.  I don't know how multiple heads works with hg,
but I wonder if we could make hg smart enough to do the same thing.

I'm not sure if this is what you're asking, but you can pull a specifichead and its ancestor changes using "hg pull -r <rev>".


On 13-04-03 18:36 , L. David Baron wrote:
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind.  (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

This is a valid point. I'll note here that a lot of people find thetest/build coalescing on inbound problematic because it makes findingregressions much harder, and in general I would like to kill suchcoalescing. That being said, if you do believe that you don't need afull test run for every push, then it's extremely simple to do that inmy proposal as well. When the patches are pushed to try, the developercan choose to run specific sets of tests (which they should already bedoing using trychooser) rather than all of the tests. The rest of myproposal remains unchanged.

(Note also that I did exactly this when I pushed 6348af4fe6aa to tryearlier today - I knew it would only touch android, so I only ran theandroid tests. And once it came out green, I merged it to m-c.)

Using this modification to the original proposal, developers have morecontrol over which tests are run on their changes. I think this isbetter than having some hard-coded set of rules in moz.build files orthe tbpl database, which are (a) very difficult to get right and (b) caneasily get out of date.

If the sheriffs are not satisfied with the test coverage that thedeveloper chose when pushing to try, they can choose to not merge thepatch. If the developer omitted some tests that did end up breaking onthe merge to m-c, then the situation is no worse than it is now withcoalesced tests - that is, some amount of effort will be needed tofigure out what actually broke the tests.


On 13-04-03 18:43 , Gary Kwong wrote:
> autoBisect relies heavily on `hg bisect` and has been working well with
> JSBugMon especially in Spidermonkey for the better part of ~ 4 years now.
>

How does it deal with bisecting through merges? Do you think it would beaffected by my proposal? Justin's comments in this thread about howbisecting might actually be improved also make sense to me.


On 13-04-03 19:28 , Joshua Cranmer 🐧 wrote:
> I think the relative merits of this approach is easily decidable by
> looking at two numbers:
> 1. How many of our patch landings would have non-trivial merge conflicts?
> 2. How much patch bustage is caused by interaction failures instead of
> patch author/reviewer negligence?
>
> Note that your proposal makes both of these cases strictly worse: in the
> first case, you force the tree sheriffs (instead of developers) to spend
> more time resolving conflicts; in the second, sheriffs have to spend
> more time finding the cause of the bustage. Keep in mind that we have
> about three sheriffs and about a hundred developers--so it is better to
> optimize for sheriffs' time instead of developers' time (at the very
> least, developers represent better parallelization opportunities).

You have good points, but you're assuming that the sheriffs have to dealwith these problems. I would be perfectly fine with saying "in case ofnon-trivial merge conflicts, the sheriffs should leave the patch on tryand ask the developer to rebase against a newer m-c". That is, throwmerge conflicts back to the developer to resolve. This is effectivelywhat happens now anyway because developers who lose push races have tomanually resolve their merge conflicts.

In the second case (patch bustage due to interaction), again I thinkthis could be thrown back to the developer, as I stated in my originalproposal (see comments under Scenario D).

I would also be interested in getting concrete numbers for these twothings, though.


On 13-04-03 19:49 , Jesse Ruderman wrote:
> But can we do this with rebased changesets instead of "trivial" merge
> changesets?  While the core of hg can handle merges, pretty much none of
> the tools we rely on for understanding history (hg {log, grep, diff,
> bisect}) handle them well.

Do you have specific examples of how hg log, grep, and diff break downwith merges? I've had issues with bisect, but AFAIK the other commandsshould be able to deal with merges well enough.


On 13-04-03 19:59 , Jeff Hammel wrote:
> On 04/03/2013 04:44 PM, Joshua Cranmer 🐧 wrote:
>> Instead of running {mochitest-*,reftest,crashtest,xpcshell,marionette}
>> on every single desktop platform on every inbound checkin, run them
>> round-robin. A given push would trigger mochitest on Linux, reftest on
>> mac, and then the next test would run reftest on Linux and mochitest
>> on Mac. Each push would still end up running the full testsuite
>> (modulo some tests that are only run on some platforms) on some
>> platform, and we would be using fewer test resources to achieve that
>> coverage. If most actual test bustage is not platform-specific, this
>> would give us generally sufficiently good coverage for most checkins.
>> I am not qualified to say, however, if this is the case.
>>
> If there was a way/guidelines to do this in some sensible way, or even a
> cultural norm like "reviewer comments on what tests to run", I'd give a
> +0.5 here fwiw

Just wanted to point out that the modified proposal I suggest earlier inthis email allows this to happen, and without requiring any changeswhatsoever to our existing infrastructure. Just use regular trychoosersyntax to pick your tests when pushing to try.


On 13-04-03 20:39 , Gregory Szorc wrote:
> I pulled the raw builder logs from
> https://secure.pub.build.mozilla.org/builddata/buildjson/ and assembled
> a tab-separated file of all the builds for 2013-03-17 through 2013-03-23:

Awesome! I will try to pull out some useful metrics from this. Forexample, how much machine time was spent on backout pushes? This issomething that should be almost completely eliminated in my proposal, soI'm curious to know as a ballpark figure how much saving we would getfrom this alone.


On 13-04-03 20:57 , Ehsan Akhgari wrote:
> Yes.  In addition to the cost of try pushes, this also incurs the cost
> of building and testing those merge commits which would not have existed
> otherwise.  In a world where 1 out of 2-3 pushes is bustage this might
> be a good idea, but that's not currently the case.
>

Disagree. Consider:

----------
Scenario E - the one with no bugs
----------

In this scenario, all programmers are perfect and write perfect patchesthat never have bugs. They also know they are perfect so they alwayspush directly to inbound and never to try.


Current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, tree is green.
Patch D lands on inbound, tree is green.
Patch E lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

TBPL time used: 6 units.

My suggested process:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, tree is green.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patches A, B, C, D, and E are merged to m-c, tree is green.

TBPL time used: 6 units.

The number of merge commits in both cases depends on how frequently thesheriffs choose to merge to m-c. I see no reason to increase or decreasethe frequency just because of the change in process. Of course, if theoverall machine resource usage goes down then we may be able to increasethe rate at which we land patches. This might increase the frequencywith which merges are done, but the system will reach an equilibriumpoint at a higher commit frequency rather than diverging.


On 13-04-03 23:01 , Clint Talbert wrote:
> Joel and I did some calculations:
> * 200 pushes/day[1]
> * 325 test jobs/push
> * 25 builds/push
> * .41 hours/test (on average, from above numbers)
> * 1.1 hours/build (on average, based on try values from above)
[snip]
> So we need 32150 compute hours per day to keep up.
> If you see above our totals for the week of data that gps provided us
> with you can see that we are currently running at: 135004hours/week /
> 7days = 19286 compute hours/day
>

This is interesting, and I will need to spend some more time looking atthe data before I can respond to this properly. However I'm not entirelysure where you got your 325 test jobs/push and 25 builds/push numbersfrom. If we are doing 200 pushes/day that's 1400 pushes/week. gps' datafile has 268632 test jobs and 26077 build jobs, which works out to191.88 test jobs/push and 18.6 builds/push.


Cheers,
kats
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Proposal for using a multi-headed tree instead of inbound

Reply via email to