On Wed, Jul 23, 2014 at 17:11:40 +0200,
Peter Zijlstra wrote:
OK, so that's become the below patch. I'll feed it to Ingo if that's OK
with hpa.
I tested this patch on 3 machines and it continued to fix the one that
was broken and didn't seem to break anything on the two that weren't
broken
On 07/23/2014 08:11 AM, Peter Zijlstra wrote:
>
> OK, so that's become the below patch. I'll feed it to Ingo if that's OK
> with hpa.
>
I'll grab it directly, it is a bit quicker that way.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a
OK, so that's become the below patch. I'll feed it to Ingo if that's OK
with hpa.
---
Subject: x86: Fix cache topology for early P4-SMT
From: Peter Zijlstra
Date: Tue, 22 Jul 2014 15:35:14 +0200
P4 systems with cpuid level < 4 can have SMT, but the cache topology
description available (cpuid2)
On Tue, Jul 22, 2014 at 08:37:19PM -0500, Bruno Wolff III wrote:
>build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask:
> 0,2
> [0.252441] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0,2
> [0.252526] build_sched_domain: cpu: 0 level: DIE cpu_map: 0-3
On Tue, Jul 22, 2014 at 16:18:55 +0200,
Peter Zijlstra wrote:
You can put this on top of them. I hope that this will make the pr_err()
introduced in the robustify patch go away.
I went to 3.16-rc6 and then reapplied three patches from your previous
email messages. The dmesg output and the d
On 07/22/2014 06:35 AM, Peter Zijlstra wrote:
> On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote:
>> On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote:
>>> Oh, of course we do SMP detection and setup after the cache setup...
>>> lovely.
>>>
>>> /me goes bang head against
On Tue, Jul 22, 2014 at 09:09:12AM -0500, Bruno Wolff III wrote:
> On Tue, Jul 22, 2014 at 15:35:14 +0200,
> Peter Zijlstra wrote:
> >On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote:
> >
> >Something like so.. anything obviously broken?
>
> Do you want me to test this change inste
On Tue, Jul 22, 2014 at 15:35:14 +0200,
Peter Zijlstra wrote:
On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote:
Something like so.. anything obviously broken?
Do you want me to test this change instead of, or combined with the other
patch you wanted tested earlier?
---
arc
On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote:
> > Oh, of course we do SMP detection and setup after the cache setup...
> > lovely.
> >
> > /me goes bang head against wall
>
> hpa, could we move the legacy cpuid1/c
On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote:
> Oh, of course we do SMP detection and setup after the cache setup...
> lovely.
>
> /me goes bang head against wall
hpa, could we move the legacy cpuid1/cpuid4 topology detection muck up,
preferably right after detect_extended_topol
On Tue, Jul 22, 2014 at 07:10:01AM -0500, Bruno Wolff III wrote:
> On Tue, Jul 22, 2014 at 12:38:57 +0200,
> Peter Zijlstra wrote:
> >
> >Could you provide the output of cpuid and cpuid -r for your machine?
> >This code is magic and I've no idea what your machine is telling it to
> >do :/
>
> I
On Tue, Jul 22, 2014 at 11:47:40 +0200,
Peter Zijlstra wrote:
On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote:
On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote:
> Is there more I can do to help with this now? Or should I just wait for
> patches to test?
Yeah, sor
On Tue, Jul 22, 2014 at 12:38:57 +0200,
Peter Zijlstra wrote:
Could you provide the output of cpuid and cpuid -r for your machine?
This code is magic and I've no idea what your machine is telling it to
do :/
I am attaching both sets of output. (I also added copies to the bug report.)
CPU 0:
On 22/07/14 10:47, Peter Zijlstra wrote:
> On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote:
>> On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote:
>>> Is there more I can do to help with this now? Or should I just wait for
>>> patches to test?
>>
>> Yeah, sorry, was wipe
On Tue, Jul 22, 2014 at 11:47:40AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote:
> > > Is there more I can do to help with this now? Or should I just wait for
> > > patches to test?
On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote:
> > Is there more I can do to help with this now? Or should I just wait for
> > patches to test?
>
> Yeah, sorry, was wiped out today. I'll go stare harder at the P4
>
On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote:
> Is there more I can do to help with this now? Or should I just wait for
> patches to test?
Yeah, sorry, was wiped out today. I'll go stare harder at the P4
topology setup code tomorrow. Something fishy there.
--
To unsubscribe from
Is there more I can do to help with this now? Or should I just wait for
patches to test?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read
On Fri, Jul 18, 2014 at 04:50:40PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 18, 2014 at 04:16:48PM +0200, Peter Zijlstra wrote:
> > On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote:
> > > build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2
> > > [0.254433] build_sc
On Fri, Jul 18, 2014 at 04:16:48PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote:
> > build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2
> > [0.254433] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0
> > [0.254516]
On 18/07/14 15:01, Bruno Wolff III wrote:
On Fri, Jul 18, 2014 at 12:16:33 +0200,
Peter Zijlstra wrote:
So it looks like the actual domain tree is broken, and not what we
assumed it was.
Could I bother you to run with the below instead? It should also print
out the sched domain masks so we
On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote:
> build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2
> [0.254433] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0
> [0.254516] build_sched_domain: cpu: 0 level: DIE cpu_map: 0-3 tl->mask:
> 0-3
> [
On Fri, Jul 18, 2014 at 12:16:33 +0200,
Peter Zijlstra wrote:
So it looks like the actual domain tree is broken, and not what we
assumed it was.
Could I bother you to run with the below instead? It should also print
out the sched domain masks so we don't need to guess about them.
The full dm
On Fri, Jul 18, 2014 at 11:28:14 +0200,
Dietmar Eggemann wrote:
Didn't see what I was looking for in your dmesg output. Did you use
'earlyprintk=keep sched_debug'
I was missing a space. I'll get it on the next run.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
On Fri, Jul 18, 2014 at 12:34:49AM -0500, Bruno Wolff III wrote:
> On Thu, Jul 17, 2014 at 14:35:02 +0200,
> Peter Zijlstra wrote:
> >
> >In any case, can someone who can trigger this run with the below; its
> >'clean' for me, but supposedly you'll trigger a FAIL somewhere.
>
> I got a couple of
On 18/07/14 07:34, Bruno Wolff III wrote:
On Thu, Jul 17, 2014 at 14:35:02 +0200,
Peter Zijlstra wrote:
In any case, can someone who can trigger this run with the below; its
'clean' for me, but supposedly you'll trigger a FAIL somewhere.
I got a couple of fail messages.
dmesg output is a
On Thu, Jul 17, 2014 at 14:35:02 +0200,
Peter Zijlstra wrote:
In any case, can someone who can trigger this run with the below; its
'clean' for me, but supposedly you'll trigger a FAIL somewhere.
I got a couple of fail messages.
dmesg output is available in the bug as the following attachme
On Thu, Jul 17, 2014 at 20:43:16 +0200,
Dietmar Eggemann wrote:
If you could apply the patch:
https://lkml.org/lkml/2014/7/17/288
and then run it on your machine, that would give us more details, i.e.
the information on which sched_group(s) and in which sched domain
level (SMT and/or DIE)
On 17/07/14 18:36, Bruno Wolff III wrote:
I did a few quick boots this morning while taking a bunch of pictures. I have
gone through some of them this morning and found one that shows bug on
was triggered at 5850 which is from:
BUG_ON(!cpumask_empty(sched_group_cpus(sg)));
You can see the JPEG a
I did a few quick boots this morning while taking a bunch of pictures. I have
gone through some of them this morning and found one that shows bug on
was triggered at 5850 which is from:
BUG_ON(!cpumask_empty(sched_group_cpus(sg)));
You can see the JPEG at:
https://bugzilla.kernel.org/attachment
On Thu, Jul 17, 2014 at 01:23:51PM +0200, Dietmar Eggemann wrote:
> On 17/07/14 11:04, Peter Zijlstra wrote:
> >On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote:
> >>There is also the possibility that the memory for sched_group sg is not
> >>(completely) zeroed out:
> >>
> >> sg =
On 17/07/14 11:04, Peter Zijlstra wrote:
On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote:
There is also the possibility that the memory for sched_group sg is not
(completely) zeroed out:
sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(),
G
On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote:
> There is also the possibility that the memory for sched_group sg is not
> (completely) zeroed out:
>
> sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(),
> GFP_KERNEL, cpu_to_node(j));
>
>
>
On 17/07/14 05:09, Bruno Wolff III wrote:
On Thu, Jul 17, 2014 at 01:18:36 +0200,
Dietmar Eggemann wrote:
So the output of
$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/*
would be handy too.
Thanks, this was helpful.
I see from the sched domain layout that you have SMT (domain0) and D
On Wed, Jul 16, 2014 at 21:17:32 +0200,
Dietmar Eggemann wrote:
Could you please share:
cat /proc/cpuinfo and
cat /proc/schedstat (kernel config w/ CONFIG_SCHEDSTATS=y)
/proc/schedstat output is attached.
version 15
timestamp 4294858660
cpu0 12 0 85767 30027 61826 37767 15709950719 562024106
Could you also put the two BUG_ON lines into build_sched_groups()
[kernel/sched/core.c] wo/ the cpumask_clear() and setting
sg->sgc->capacity to 0 and share the possible crash output as well?
I can try a new build with this. I can probably get results back tomorrow
before I leave for work. The c
On Thu, Jul 17, 2014 at 01:18:36 +0200,
Dietmar Eggemann wrote:
So the output of
$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/*
would be handy too.
Attached and added to the bug.
Just to make sure, you do have 'CONFIG_X86_32=y' and '# CONFIG_NUMA is
not set' in your build?
Yes.
I p
On 16/07/14 21:54, Bruno Wolff III wrote:
On Wed, Jul 16, 2014 at 21:17:32 +0200,
Dietmar Eggemann wrote:
Hi Bruno and Josh,
From the issue, I see that the machine making trouble is an Xeon (2
processors w/ hyper-threading).
Could you please share:
cat /proc/cpuinfo and
I have attached
On Wed, Jul 16, 2014 at 21:17:32 +0200,
Dietmar Eggemann wrote:
Hi Bruno and Josh,
From the issue, I see that the machine making trouble is an Xeon (2
processors w/ hyper-threading).
Could you please share:
cat /proc/cpuinfo and
I have attached it to the bug and to this message.
cat /p
Hi Bruno and Josh,
On 16/07/14 17:17, Josh Boyer wrote:
Adding Dietmar in since he is the original author.
josh
On Wed, Jul 16, 2014 at 09:55:46AM -0500, Bruno Wolff III wrote:
caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes
early in the boot process on one of three machines
Adding Dietmar in since he is the original author.
josh
On Wed, Jul 16, 2014 at 09:55:46AM -0500, Bruno Wolff III wrote:
> caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes
> early in the boot process on one of three machines I have been
> testing the kernel on. On that one machin
caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes early in
the boot process on one of three machines I have been testing the kernel
on. On that one machine it happens every boot. It happens before netconsole
is functional.
A partial revert of the commit fixes the problem. I do
42 matches
Mail list logo