Thanks for the many comments! I replied to a bunch of stuff below.

On 13-04-03 18:34 , Justin Lebar wrote:
This is a really interesting idea.

git is smart enough to let you pull only the csets that end up getting
merged into master.  I don't know how multiple heads works with hg,
but I wonder if we could make hg smart enough to do the same thing.

I'm not sure if this is what you're asking, but you can pull a specific head and its ancestor changes using "hg pull -r <rev>".

On 13-04-03 18:36 , L. David Baron wrote:
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind.  (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

This is a valid point. I'll note here that a lot of people find the test/build coalescing on inbound problematic because it makes finding regressions much harder, and in general I would like to kill such coalescing. That being said, if you do believe that you don't need a full test run for every push, then it's extremely simple to do that in my proposal as well. When the patches are pushed to try, the developer can choose to run specific sets of tests (which they should already be doing using trychooser) rather than all of the tests. The rest of my proposal remains unchanged.

(Note also that I did exactly this when I pushed 6348af4fe6aa to try earlier today - I knew it would only touch android, so I only ran the android tests. And once it came out green, I merged it to m-c.)

Using this modification to the original proposal, developers have more control over which tests are run on their changes. I think this is better than having some hard-coded set of rules in moz.build files or the tbpl database, which are (a) very difficult to get right and (b) can easily get out of date.

If the sheriffs are not satisfied with the test coverage that the developer chose when pushing to try, they can choose to not merge the patch. If the developer omitted some tests that did end up breaking on the merge to m-c, then the situation is no worse than it is now with coalesced tests - that is, some amount of effort will be needed to figure out what actually broke the tests.

On 13-04-03 18:43 , Gary Kwong wrote:
> autoBisect relies heavily on `hg bisect` and has been working well with
> JSBugMon especially in Spidermonkey for the better part of ~ 4 years now.
>

How does it deal with bisecting through merges? Do you think it would be affected by my proposal? Justin's comments in this thread about how bisecting might actually be improved also make sense to me.

On 13-04-03 19:28 , Joshua Cranmer 🐧 wrote:
> I think the relative merits of this approach is easily decidable by
> looking at two numbers:
> 1. How many of our patch landings would have non-trivial merge conflicts?
> 2. How much patch bustage is caused by interaction failures instead of
> patch author/reviewer negligence?
>
> Note that your proposal makes both of these cases strictly worse: in the
> first case, you force the tree sheriffs (instead of developers) to spend
> more time resolving conflicts; in the second, sheriffs have to spend
> more time finding the cause of the bustage. Keep in mind that we have
> about three sheriffs and about a hundred developers--so it is better to
> optimize for sheriffs' time instead of developers' time (at the very
> least, developers represent better parallelization opportunities).

You have good points, but you're assuming that the sheriffs have to deal with these problems. I would be perfectly fine with saying "in case of non-trivial merge conflicts, the sheriffs should leave the patch on try and ask the developer to rebase against a newer m-c". That is, throw merge conflicts back to the developer to resolve. This is effectively what happens now anyway because developers who lose push races have to manually resolve their merge conflicts.

In the second case (patch bustage due to interaction), again I think this could be thrown back to the developer, as I stated in my original proposal (see comments under Scenario D).

I would also be interested in getting concrete numbers for these two things, though.

On 13-04-03 19:49 , Jesse Ruderman wrote:
> But can we do this with rebased changesets instead of "trivial" merge
> changesets?  While the core of hg can handle merges, pretty much none of
> the tools we rely on for understanding history (hg {log, grep, diff,
> bisect}) handle them well.

Do you have specific examples of how hg log, grep, and diff break down with merges? I've had issues with bisect, but AFAIK the other commands should be able to deal with merges well enough.

On 13-04-03 19:59 , Jeff Hammel wrote:
> On 04/03/2013 04:44 PM, Joshua Cranmer 🐧 wrote:
>> Instead of running {mochitest-*,reftest,crashtest,xpcshell,marionette}
>> on every single desktop platform on every inbound checkin, run them
>> round-robin. A given push would trigger mochitest on Linux, reftest on
>> mac, and then the next test would run reftest on Linux and mochitest
>> on Mac. Each push would still end up running the full testsuite
>> (modulo some tests that are only run on some platforms) on some
>> platform, and we would be using fewer test resources to achieve that
>> coverage. If most actual test bustage is not platform-specific, this
>> would give us generally sufficiently good coverage for most checkins.
>> I am not qualified to say, however, if this is the case.
>>
> If there was a way/guidelines to do this in some sensible way, or even a
> cultural norm like "reviewer comments on what tests to run", I'd give a
> +0.5 here fwiw

Just wanted to point out that the modified proposal I suggest earlier in this email allows this to happen, and without requiring any changes whatsoever to our existing infrastructure. Just use regular trychooser syntax to pick your tests when pushing to try.

On 13-04-03 20:39 , Gregory Szorc wrote:
> I pulled the raw builder logs from
> https://secure.pub.build.mozilla.org/builddata/buildjson/ and assembled
> a tab-separated file of all the builds for 2013-03-17 through 2013-03-23:

Awesome! I will try to pull out some useful metrics from this. For example, how much machine time was spent on backout pushes? This is something that should be almost completely eliminated in my proposal, so I'm curious to know as a ballpark figure how much saving we would get from this alone.

On 13-04-03 20:57 , Ehsan Akhgari wrote:
> Yes.  In addition to the cost of try pushes, this also incurs the cost
> of building and testing those merge commits which would not have existed
> otherwise.  In a world where 1 out of 2-3 pushes is bustage this might
> be a good idea, but that's not currently the case.
>

Disagree. Consider:

----------
Scenario E - the one with no bugs
----------
In this scenario, all programmers are perfect and write perfect patches that never have bugs. They also know they are perfect so they always push directly to inbound and never to try.

Current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, tree is green.
Patch D lands on inbound, tree is green.
Patch E lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

TBPL time used: 6 units.

My suggested process:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, tree is green.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patches A, B, C, D, and E are merged to m-c, tree is green.

TBPL time used: 6 units.

The number of merge commits in both cases depends on how frequently the sheriffs choose to merge to m-c. I see no reason to increase or decrease the frequency just because of the change in process. Of course, if the overall machine resource usage goes down then we may be able to increase the rate at which we land patches. This might increase the frequency with which merges are done, but the system will reach an equilibrium point at a higher commit frequency rather than diverging.

On 13-04-03 23:01 , Clint Talbert wrote:
> Joel and I did some calculations:
> * 200 pushes/day[1]
> * 325 test jobs/push
> * 25 builds/push
> * .41 hours/test (on average, from above numbers)
> * 1.1 hours/build (on average, based on try values from above)
[snip]
> So we need 32150 compute hours per day to keep up.
> If you see above our totals for the week of data that gps provided us
> with you can see that we are currently running at: 135004hours/week /
> 7days = 19286 compute hours/day
>

This is interesting, and I will need to spend some more time looking at the data before I can respond to this properly. However I'm not entirely sure where you got your 325 test jobs/push and 25 builds/push numbers from. If we are doing 200 pushes/day that's 1400 pushes/week. gps' data file has 268632 test jobs and 26077 build jobs, which works out to 191.88 test jobs/push and 18.6 builds/push.

Cheers,
kats
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to