subject:"Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c"

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-23 Thread Bruno Wolff III

On Wed, Jul 23, 2014 at 17:11:40 +0200, Peter Zijlstra wrote: OK, so that's become the below patch. I'll feed it to Ingo if that's OK with hpa. I tested this patch on 3 machines and it continued to fix the one that was broken and didn't seem to break anything on the two that weren't broken

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-23 Thread H. Peter Anvin

On 07/23/2014 08:11 AM, Peter Zijlstra wrote: > > OK, so that's become the below patch. I'll feed it to Ingo if that's OK > with hpa. > I'll grab it directly, it is a bit quicker that way. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-23 Thread Peter Zijlstra

OK, so that's become the below patch. I'll feed it to Ingo if that's OK with hpa. --- Subject: x86: Fix cache topology for early P4-SMT From: Peter Zijlstra Date: Tue, 22 Jul 2014 15:35:14 +0200 P4 systems with cpuid level < 4 can have SMT, but the cache topology description available (cpuid2)

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 08:37:19PM -0500, Bruno Wolff III wrote: >build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: > 0,2 > [0.252441] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0,2 > [0.252526] build_sched_domain: cpu: 0 level: DIE cpu_map: 0-3

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Bruno Wolff III

On Tue, Jul 22, 2014 at 16:18:55 +0200, Peter Zijlstra wrote: You can put this on top of them. I hope that this will make the pr_err() introduced in the robustify patch go away. I went to 3.16-rc6 and then reapplied three patches from your previous email messages. The dmesg output and the d

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread H. Peter Anvin

On 07/22/2014 06:35 AM, Peter Zijlstra wrote: > On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote: >> On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote: >>> Oh, of course we do SMP detection and setup after the cache setup... >>> lovely. >>> >>> /me goes bang head against

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 09:09:12AM -0500, Bruno Wolff III wrote: > On Tue, Jul 22, 2014 at 15:35:14 +0200, > Peter Zijlstra wrote: > >On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote: > > > >Something like so.. anything obviously broken? > > Do you want me to test this change inste

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Bruno Wolff III

On Tue, Jul 22, 2014 at 15:35:14 +0200, Peter Zijlstra wrote: On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote: Something like so.. anything obviously broken? Do you want me to test this change instead of, or combined with the other patch you wanted tested earlier? --- arc

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 03:26:03PM +0200, Peter Zijlstra wrote: > On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote: > > Oh, of course we do SMP detection and setup after the cache setup... > > lovely. > > > > /me goes bang head against wall > > hpa, could we move the legacy cpuid1/c

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 03:03:43PM +0200, Peter Zijlstra wrote: > Oh, of course we do SMP detection and setup after the cache setup... > lovely. > > /me goes bang head against wall hpa, could we move the legacy cpuid1/cpuid4 topology detection muck up, preferably right after detect_extended_topol

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 07:10:01AM -0500, Bruno Wolff III wrote: > On Tue, Jul 22, 2014 at 12:38:57 +0200, > Peter Zijlstra wrote: > > > >Could you provide the output of cpuid and cpuid -r for your machine? > >This code is magic and I've no idea what your machine is telling it to > >do :/ > > I

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Bruno Wolff III

On Tue, Jul 22, 2014 at 11:47:40 +0200, Peter Zijlstra wrote: On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote: On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote: > Is there more I can do to help with this now? Or should I just wait for > patches to test? Yeah, sor

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Bruno Wolff III

On Tue, Jul 22, 2014 at 12:38:57 +0200, Peter Zijlstra wrote: Could you provide the output of cpuid and cpuid -r for your machine? This code is magic and I've no idea what your machine is telling it to do :/ I am attaching both sets of output. (I also added copies to the bug report.) CPU 0:

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Dietmar Eggemann

On 22/07/14 10:47, Peter Zijlstra wrote: > On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote: >> On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote: >>> Is there more I can do to help with this now? Or should I just wait for >>> patches to test? >> >> Yeah, sorry, was wipe

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Tue, Jul 22, 2014 at 11:47:40AM +0200, Peter Zijlstra wrote: > On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote: > > On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote: > > > Is there more I can do to help with this now? Or should I just wait for > > > patches to test?

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-22 Thread Peter Zijlstra

On Mon, Jul 21, 2014 at 06:52:12PM +0200, Peter Zijlstra wrote: > On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote: > > Is there more I can do to help with this now? Or should I just wait for > > patches to test? > > Yeah, sorry, was wiped out today. I'll go stare harder at the P4 >

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-21 Thread Peter Zijlstra

On Mon, Jul 21, 2014 at 11:35:28AM -0500, Bruno Wolff III wrote: > Is there more I can do to help with this now? Or should I just wait for > patches to test? Yeah, sorry, was wiped out today. I'll go stare harder at the P4 topology setup code tomorrow. Something fishy there. -- To unsubscribe from

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-21 Thread Bruno Wolff III

Is there more I can do to help with this now? Or should I just wait for patches to test? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Peter Zijlstra

On Fri, Jul 18, 2014 at 04:50:40PM +0200, Peter Zijlstra wrote: > On Fri, Jul 18, 2014 at 04:16:48PM +0200, Peter Zijlstra wrote: > > On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote: > > > build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2 > > > [0.254433] build_sc

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Peter Zijlstra

On Fri, Jul 18, 2014 at 04:16:48PM +0200, Peter Zijlstra wrote: > On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote: > > build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2 > > [0.254433] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0 > > [0.254516]

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Dietmar Eggemann

On 18/07/14 15:01, Bruno Wolff III wrote: On Fri, Jul 18, 2014 at 12:16:33 +0200, Peter Zijlstra wrote: So it looks like the actual domain tree is broken, and not what we assumed it was. Could I bother you to run with the below instead? It should also print out the sched domain masks so we

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Peter Zijlstra

On Fri, Jul 18, 2014 at 08:01:26AM -0500, Bruno Wolff III wrote: > build_sched_domain: cpu: 0 level: SMT cpu_map: 0-3 tl->mask: 0,2 > [0.254433] build_sched_domain: cpu: 0 level: MC cpu_map: 0-3 tl->mask: 0 > [0.254516] build_sched_domain: cpu: 0 level: DIE cpu_map: 0-3 tl->mask: > 0-3 > [

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Bruno Wolff III

On Fri, Jul 18, 2014 at 12:16:33 +0200, Peter Zijlstra wrote: So it looks like the actual domain tree is broken, and not what we assumed it was. Could I bother you to run with the below instead? It should also print out the sched domain masks so we don't need to guess about them. The full dm

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Bruno Wolff III

On Fri, Jul 18, 2014 at 11:28:14 +0200, Dietmar Eggemann wrote: Didn't see what I was looking for in your dmesg output. Did you use 'earlyprintk=keep sched_debug' I was missing a space. I'll get it on the next run. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Peter Zijlstra

On Fri, Jul 18, 2014 at 12:34:49AM -0500, Bruno Wolff III wrote: > On Thu, Jul 17, 2014 at 14:35:02 +0200, > Peter Zijlstra wrote: > > > >In any case, can someone who can trigger this run with the below; its > >'clean' for me, but supposedly you'll trigger a FAIL somewhere. > > I got a couple of

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-18 Thread Dietmar Eggemann

On 18/07/14 07:34, Bruno Wolff III wrote: On Thu, Jul 17, 2014 at 14:35:02 +0200, Peter Zijlstra wrote: In any case, can someone who can trigger this run with the below; its 'clean' for me, but supposedly you'll trigger a FAIL somewhere. I got a couple of fail messages. dmesg output is a

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Bruno Wolff III

On Thu, Jul 17, 2014 at 14:35:02 +0200, Peter Zijlstra wrote: In any case, can someone who can trigger this run with the below; its 'clean' for me, but supposedly you'll trigger a FAIL somewhere. I got a couple of fail messages. dmesg output is available in the bug as the following attachme

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Bruno Wolff III

On Thu, Jul 17, 2014 at 20:43:16 +0200, Dietmar Eggemann wrote: If you could apply the patch: https://lkml.org/lkml/2014/7/17/288 and then run it on your machine, that would give us more details, i.e. the information on which sched_group(s) and in which sched domain level (SMT and/or DIE)

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Dietmar Eggemann

On 17/07/14 18:36, Bruno Wolff III wrote: I did a few quick boots this morning while taking a bunch of pictures. I have gone through some of them this morning and found one that shows bug on was triggered at 5850 which is from: BUG_ON(!cpumask_empty(sched_group_cpus(sg))); You can see the JPEG a

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Bruno Wolff III

I did a few quick boots this morning while taking a bunch of pictures. I have gone through some of them this morning and found one that shows bug on was triggered at 5850 which is from: BUG_ON(!cpumask_empty(sched_group_cpus(sg))); You can see the JPEG at: https://bugzilla.kernel.org/attachment

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Peter Zijlstra

On Thu, Jul 17, 2014 at 01:23:51PM +0200, Dietmar Eggemann wrote: > On 17/07/14 11:04, Peter Zijlstra wrote: > >On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote: > >>There is also the possibility that the memory for sched_group sg is not > >>(completely) zeroed out: > >> > >> sg =

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Dietmar Eggemann

On 17/07/14 11:04, Peter Zijlstra wrote: On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote: There is also the possibility that the memory for sched_group sg is not (completely) zeroed out: sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), G

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Peter Zijlstra

On Thu, Jul 17, 2014 at 10:57:55AM +0200, Dietmar Eggemann wrote: > There is also the possibility that the memory for sched_group sg is not > (completely) zeroed out: > > sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), > GFP_KERNEL, cpu_to_node(j)); > > >

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-17 Thread Dietmar Eggemann

On 17/07/14 05:09, Bruno Wolff III wrote: On Thu, Jul 17, 2014 at 01:18:36 +0200, Dietmar Eggemann wrote: So the output of $ cat /proc/sys/kernel/sched_domain/cpu*/domain*/* would be handy too. Thanks, this was helpful. I see from the sched domain layout that you have SMT (domain0) and D

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Bruno Wolff III

On Wed, Jul 16, 2014 at 21:17:32 +0200, Dietmar Eggemann wrote: Could you please share: cat /proc/cpuinfo and cat /proc/schedstat (kernel config w/ CONFIG_SCHEDSTATS=y) /proc/schedstat output is attached. version 15 timestamp 4294858660 cpu0 12 0 85767 30027 61826 37767 15709950719 562024106

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Bruno Wolff III

Could you also put the two BUG_ON lines into build_sched_groups() [kernel/sched/core.c] wo/ the cpumask_clear() and setting sg->sgc->capacity to 0 and share the possible crash output as well? I can try a new build with this. I can probably get results back tomorrow before I leave for work. The c

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Bruno Wolff III

On Thu, Jul 17, 2014 at 01:18:36 +0200, Dietmar Eggemann wrote: So the output of $ cat /proc/sys/kernel/sched_domain/cpu*/domain*/* would be handy too. Attached and added to the bug. Just to make sure, you do have 'CONFIG_X86_32=y' and '# CONFIG_NUMA is not set' in your build? Yes. I p

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Dietmar Eggemann

On 16/07/14 21:54, Bruno Wolff III wrote: On Wed, Jul 16, 2014 at 21:17:32 +0200, Dietmar Eggemann wrote: Hi Bruno and Josh, From the issue, I see that the machine making trouble is an Xeon (2 processors w/ hyper-threading). Could you please share: cat /proc/cpuinfo and I have attached

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Bruno Wolff III

On Wed, Jul 16, 2014 at 21:17:32 +0200, Dietmar Eggemann wrote: Hi Bruno and Josh, From the issue, I see that the machine making trouble is an Xeon (2 processors w/ hyper-threading). Could you please share: cat /proc/cpuinfo and I have attached it to the bug and to this message. cat /p

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Dietmar Eggemann

Hi Bruno and Josh, On 16/07/14 17:17, Josh Boyer wrote: Adding Dietmar in since he is the original author. josh On Wed, Jul 16, 2014 at 09:55:46AM -0500, Bruno Wolff III wrote: caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes early in the boot process on one of three machines

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Josh Boyer

Adding Dietmar in since he is the original author. josh On Wed, Jul 16, 2014 at 09:55:46AM -0500, Bruno Wolff III wrote: > caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes > early in the boot process on one of three machines I have been > testing the kernel on. On that one machin

Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-16 Thread Bruno Wolff III

caffcdd8d27ba78730d5540396ce72ad022aff2c has been causing crashes early in the boot process on one of three machines I have been testing the kernel on. On that one machine it happens every boot. It happens before netconsole is functional. A partial revert of the commit fixes the problem. I do

42 matches

Mail list logo