On Wed, 2011-07-20 at 09:04 -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2011 at 7:58 AM, Peter Zijlstra
> wrote:
> >
> > Right, so we can either merge my scary patches now and have 3.0 boot on
> > 16+ node machines (and risk breaking something), or delay them until
> > 3.0.1 and have 16+ node
* Linus Torvalds wrote:
> On Wed, Jul 20, 2011 at 7:58 AM, Peter Zijlstra
> wrote:
> >
> > Right, so we can either merge my scary patches now and have 3.0
> > boot on 16+ node machines (and risk breaking something), or delay
> > them until 3.0.1 and have 16+ node machines suffer a little.
>
On Wed, Jul 20, 2011 at 7:58 AM, Peter Zijlstra wrote:
>
> Right, so we can either merge my scary patches now and have 3.0 boot on
> 16+ node machines (and risk breaking something), or delay them until
> 3.0.1 and have 16+ node machines suffer a little.
So how much impact does your scary patch ha
On Wed, 2011-07-20 at 07:40 -0700, Linus Torvalds wrote:
> On Wed, Jul 20, 2011 at 5:14 AM, Anton Blanchard wrote:
> >
> >> So with that fix the patch makes the machine happy again?
> >
> > Yes, the machine looks fine with the patches applied. Thanks!
>
> Ok, so what's the situation for 3.0 (I'm
On Wed, Jul 20, 2011 at 5:14 AM, Anton Blanchard wrote:
>
>> So with that fix the patch makes the machine happy again?
>
> Yes, the machine looks fine with the patches applied. Thanks!
Ok, so what's the situation for 3.0 (I'm waiting for some RCU
resolution now)? Anton's patch may be small, but t
Hi Peter,
> So with that fix the patch makes the machine happy again?
Yes, the machine looks fine with the patches applied. Thanks!
Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
On Wed, 2011-07-20 at 20:14 +1000, Anton Blanchard wrote:
> > That looks very strange indeed.. up to node 23 there is the normal
> > symmetric matrix with all the trace elements on 10 (as we would expect
> > for local access), and some 4x4 sub-matrix stacked around the trace
> > with 20, suggestin
Hi Peter,
> That looks very strange indeed.. up to node 23 there is the normal
> symmetric matrix with all the trace elements on 10 (as we would expect
> for local access), and some 4x4 sub-matrix stacked around the trace
> with 20, suggesting a single hop distance, and the rest on 40 being
> out
Hi,
> That looks very strange indeed.. up to node 23 there is the normal
> symmetric matrix with all the trace elements on 10 (as we would expect
> for local access), and some 4x4 sub-matrix stacked around the trace
> with 20, suggesting a single hop distance, and the rest on 40 being
> out-there
On Tue, 2011-07-19 at 14:44 +1000, Anton Blanchard wrote:
>
> Our node distances are a bit arbitrary (I make them up based on
> information given to us in the device tree). In terms of memory we have
> a maximum of three levels. To give some gross estimates, on chip memory
> might be 30GB/sec, on
On Mon, 18 Jul 2011 23:35:56 +0200
Peter Zijlstra wrote:
> Anton, could you test the below two patches on that machine?
>
> It should make things boot again, while I don't have a machine nearly
> big enough to trigger any of this, I tested the new code paths by
> setting FORCE_SD_OVERLAP in /deb
Anton, could you test the below two patches on that machine?
It should make things boot again, while I don't have a machine nearly
big enough to trigger any of this, I tested the new code paths by
setting FORCE_SD_OVERLAP in /debug/sched_features. Although any review
of the error paths would be mu
On Fri, 2011-07-15 at 10:45 +1000, Anton Blanchard wrote:
> Hi,
>
> > Urgh.. so those spans are generated by sched_domain_node_span(), and
> > it looks like that simply picks the 15 nearest nodes to the one we've
> > got without consideration for overlap with previously generated spans.
>
> I do
Hi,
> Urgh.. so those spans are generated by sched_domain_node_span(), and
> it looks like that simply picks the 15 nearest nodes to the one we've
> got without consideration for overlap with previously generated spans.
I do wonder if we need this extra level at all on ppc64. From memory
SGI add
On Thu, 2011-07-14 at 14:35 +1000, Anton Blanchard wrote:
> I also printed out the cpu spans as we walk through build_sched_groups:
> 0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480
> Duplicates start appearing in this span:
> 128 160 192 224 256 288 320 352 384 416 448 480 512 544 57
> I took a quick look and we are stuck in update_group_power:
>
> do {
> power += group->cpu_power;
> group = group->next;
> } while (group != child->groups);
>
> I looked at the linked list:
>
> child->groups = c07b2f74ff00
>
> and dumping g
Hi Peter,
> Surely this isn't the first multi-node P7 to boot a kernel with this
> patch? If my git foo is any good it hit -next on 23rd of May.
>
> I guess I'm asking is, do smaller P7 machines boot? And if so, is
> there any difference except size?
>
> How many nodes does the thing have anywa
On Thu, 2011-07-07 at 17:25 +0530, Mahesh J Salgaonkar wrote:
> > I guess I'm asking is, do smaller P7 machines boot? And if so, is there
> > any difference except size?
>
> Yes, the smaller P7 machine that I have with 20 CPUs and 2GB ram boots
> fine with 3.0.0-rc.
That sounds like a single nod
On 2011-07-07 12:59:35 Thu, Peter Zijlstra wrote:
> On Thu, 2011-07-07 at 15:52 +0530, Mahesh J Salgaonkar wrote:
> >
> > 2.6.39 booted fine on the system and a git bisect shows commit cd4ea6ae -
> > "sched: Change NODE sched_domain group creation" as the cause.
>
> Weird, there's no locking anyw
On Thu, 2011-07-07 at 15:52 +0530, Mahesh J Salgaonkar wrote:
>
> 2.6.39 booted fine on the system and a git bisect shows commit cd4ea6ae -
> "sched: Change NODE sched_domain group creation" as the cause.
Weird, there's no locking anywhere around there. The typical problems
with this patch-set we
Hi,
linux-3.0-rc fails to boot on a power7 system with 1TB ram and 896 CPUs.
While the initial boot log shows a soft-lockup [1], the machine is hung after.
Dropping into xmon shows the cpus are all struck at:
cpu 0xa: Vector: 100 (System Reset) at [c00fae51fae0]
pc: c0
21 matches
Mail list logo