Thanks for the many comments! I replied to a bunch of stuff below.
On 13-04-03 18:34 , Justin Lebar wrote:
This is a really interesting idea.
git is smart enough to let you pull only the csets that end up getting
merged into master. I don't know how multiple heads works with hg,
but I wonder if we could make hg smart enough to do the same thing.
I'm not sure if this is what you're asking, but you can pull a specific
head and its ancestor changes using "hg pull -r <rev>".
On 13-04-03 18:36 , L. David Baron wrote:
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)
This is a valid point. I'll note here that a lot of people find the
test/build coalescing on inbound problematic because it makes finding
regressions much harder, and in general I would like to kill such
coalescing. That being said, if you do believe that you don't need a
full test run for every push, then it's extremely simple to do that in
my proposal as well. When the patches are pushed to try, the developer
can choose to run specific sets of tests (which they should already be
doing using trychooser) rather than all of the tests. The rest of my
proposal remains unchanged.
(Note also that I did exactly this when I pushed 6348af4fe6aa to try
earlier today - I knew it would only touch android, so I only ran the
android tests. And once it came out green, I merged it to m-c.)
Using this modification to the original proposal, developers have more
control over which tests are run on their changes. I think this is
better than having some hard-coded set of rules in moz.build files or
the tbpl database, which are (a) very difficult to get right and (b) can
easily get out of date.
If the sheriffs are not satisfied with the test coverage that the
developer chose when pushing to try, they can choose to not merge the
patch. If the developer omitted some tests that did end up breaking on
the merge to m-c, then the situation is no worse than it is now with
coalesced tests - that is, some amount of effort will be needed to
figure out what actually broke the tests.
On 13-04-03 18:43 , Gary Kwong wrote:
> autoBisect relies heavily on `hg bisect` and has been working well with
> JSBugMon especially in Spidermonkey for the better part of ~ 4 years now.
>
How does it deal with bisecting through merges? Do you think it would be
affected by my proposal? Justin's comments in this thread about how
bisecting might actually be improved also make sense to me.
On 13-04-03 19:28 , Joshua Cranmer 🐧 wrote:
> I think the relative merits of this approach is easily decidable by
> looking at two numbers:
> 1. How many of our patch landings would have non-trivial merge conflicts?
> 2. How much patch bustage is caused by interaction failures instead of
> patch author/reviewer negligence?
>
> Note that your proposal makes both of these cases strictly worse: in the
> first case, you force the tree sheriffs (instead of developers) to spend
> more time resolving conflicts; in the second, sheriffs have to spend
> more time finding the cause of the bustage. Keep in mind that we have
> about three sheriffs and about a hundred developers--so it is better to
> optimize for sheriffs' time instead of developers' time (at the very
> least, developers represent better parallelization opportunities).
You have good points, but you're assuming that the sheriffs have to deal
with these problems. I would be perfectly fine with saying "in case of
non-trivial merge conflicts, the sheriffs should leave the patch on try
and ask the developer to rebase against a newer m-c". That is, throw
merge conflicts back to the developer to resolve. This is effectively
what happens now anyway because developers who lose push races have to
manually resolve their merge conflicts.
In the second case (patch bustage due to interaction), again I think
this could be thrown back to the developer, as I stated in my original
proposal (see comments under Scenario D).
I would also be interested in getting concrete numbers for these two
things, though.
On 13-04-03 19:49 , Jesse Ruderman wrote:
> But can we do this with rebased changesets instead of "trivial" merge
> changesets? While the core of hg can handle merges, pretty much none of
> the tools we rely on for understanding history (hg {log, grep, diff,
> bisect}) handle them well.
Do you have specific examples of how hg log, grep, and diff break down
with merges? I've had issues with bisect, but AFAIK the other commands
should be able to deal with merges well enough.
On 13-04-03 19:59 , Jeff Hammel wrote:
> On 04/03/2013 04:44 PM, Joshua Cranmer 🐧 wrote:
>> Instead of running {mochitest-*,reftest,crashtest,xpcshell,marionette}
>> on every single desktop platform on every inbound checkin, run them
>> round-robin. A given push would trigger mochitest on Linux, reftest on
>> mac, and then the next test would run reftest on Linux and mochitest
>> on Mac. Each push would still end up running the full testsuite
>> (modulo some tests that are only run on some platforms) on some
>> platform, and we would be using fewer test resources to achieve that
>> coverage. If most actual test bustage is not platform-specific, this
>> would give us generally sufficiently good coverage for most checkins.
>> I am not qualified to say, however, if this is the case.
>>
> If there was a way/guidelines to do this in some sensible way, or even a
> cultural norm like "reviewer comments on what tests to run", I'd give a
> +0.5 here fwiw
Just wanted to point out that the modified proposal I suggest earlier in
this email allows this to happen, and without requiring any changes
whatsoever to our existing infrastructure. Just use regular trychooser
syntax to pick your tests when pushing to try.
On 13-04-03 20:39 , Gregory Szorc wrote:
> I pulled the raw builder logs from
> https://secure.pub.build.mozilla.org/builddata/buildjson/ and assembled
> a tab-separated file of all the builds for 2013-03-17 through 2013-03-23:
Awesome! I will try to pull out some useful metrics from this. For
example, how much machine time was spent on backout pushes? This is
something that should be almost completely eliminated in my proposal, so
I'm curious to know as a ballpark figure how much saving we would get
from this alone.
On 13-04-03 20:57 , Ehsan Akhgari wrote:
> Yes. In addition to the cost of try pushes, this also incurs the cost
> of building and testing those merge commits which would not have existed
> otherwise. In a world where 1 out of 2-3 pushes is bustage this might
> be a good idea, but that's not currently the case.
>
Disagree. Consider:
----------
Scenario E - the one with no bugs
----------
In this scenario, all programmers are perfect and write perfect patches
that never have bugs. They also know they are perfect so they always
push directly to inbound and never to try.
Current process:
Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, tree is green.
Patch D lands on inbound, tree is green.
Patch E lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.
TBPL time used: 6 units.
My suggested process:
Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, tree is green.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patches A, B, C, D, and E are merged to m-c, tree is green.
TBPL time used: 6 units.
The number of merge commits in both cases depends on how frequently the
sheriffs choose to merge to m-c. I see no reason to increase or decrease
the frequency just because of the change in process. Of course, if the
overall machine resource usage goes down then we may be able to increase
the rate at which we land patches. This might increase the frequency
with which merges are done, but the system will reach an equilibrium
point at a higher commit frequency rather than diverging.
On 13-04-03 23:01 , Clint Talbert wrote:
> Joel and I did some calculations:
> * 200 pushes/day[1]
> * 325 test jobs/push
> * 25 builds/push
> * .41 hours/test (on average, from above numbers)
> * 1.1 hours/build (on average, based on try values from above)
[snip]
> So we need 32150 compute hours per day to keep up.
> If you see above our totals for the week of data that gps provided us
> with you can see that we are currently running at: 135004hours/week /
> 7days = 19286 compute hours/day
>
This is interesting, and I will need to spend some more time looking at
the data before I can respond to this properly. However I'm not entirely
sure where you got your 325 test jobs/push and 25 builds/push numbers
from. If we are doing 200 pushes/day that's 1400 pushes/week. gps' data
file has 268632 test jobs and 26077 build jobs, which works out to
191.88 test jobs/push and 18.6 builds/push.
Cheers,
kats
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform