Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > CLFLUSH is only (guaranteed to be) ordered with respect to MFENCE > according to the SDM. Yes, I quoted the relevant text a couple of hours ago, CLFLUSH is pretty special, so it's not ordered against atomics or serializing instructions for example - only ordered aga

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
CLFLUSH is only (guaranteed to be) ordered with respect to MFENCE according to the SDM. Ingo Molnar wrote: > >* Peter Zijlstra wrote: > >> On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: >> > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote: >> > > Likewise, having a

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Peter Zijlstra
On Thu, Dec 19, 2013 at 11:22:01AM -0800, H. Peter Anvin wrote: > On 12/19/2013 10:19 AM, Ingo Molnar wrote: > > > > ( It would also be nice to know whether MONITOR loads that cacheline > > into the CPUs cache, and in what state it loads it. ) > > > > I would assume that is implementation-dep

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 10:19 AM, Ingo Molnar wrote: > > ( It would also be nice to know whether MONITOR loads that cacheline > into the CPUs cache, and in what state it loads it. ) > I would assume that is implementation-dependent. However, one plausible implementation is to load the cache line into

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* Linus Torvalds wrote: > The x86 memory rules are that two loads always execute in order (ie > rmb is a no-op). > > So I see no reason for a memory barrier after the monitor. [...] Yes, I'm leaning towards that interpretation as well, but the reason I'm a bit catious is the somewhat curious

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > On 12/19/2013 10:09 AM, H. Peter Anvin wrote: > > On 12/19/2013 09:07 AM, Ingo Molnar wrote: > >> > >> Likewise, having a barrier before the MONITOR looks sensible as well. > >> Having it _after_ monitor looks weird and is probably wrong. [It might > >> have been the

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 10:09 AM, H. Peter Anvin wrote: > On 12/19/2013 09:07 AM, Ingo Molnar wrote: >> >> Likewise, having a barrier before the MONITOR looks sensible as well. >> Having it _after_ monitor looks weird and is probably wrong. [It might >> have been the effects of someone seeing the spurious

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > On 12/19/2013 09:07 AM, Ingo Molnar wrote: > > > > Likewise, having a barrier before the MONITOR looks sensible as > > well. Having it _after_ monitor looks weird and is probably wrong. > > [It might have been the effects of someone seeing the spurious > > wakeup pr

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: > > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote: > > > Likewise, having a barrier before the MONITOR looks sensible as well. > > > > I again have to disagree, one would expect monitor to f

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > On 12/19/2013 09:36 AM, Peter Zijlstra wrote: > > On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: > >> That said, I would find it very strange indeed if a CLFLUSH doesn't also > >> flush the store buffer. > > > > OK, it explicitly states it does not do

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote: > > > > * H. Peter Anvin wrote: > > > > > On 12/19/2013 08:21 AM, Peter Zijlstra wrote: > > > > > > > > What's that mb for? > > > > > > > > > > It already exists in mwait_idle_with_hints(); I just moved

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 09:07 AM, Ingo Molnar wrote: > > Likewise, having a barrier before the MONITOR looks sensible as well. > Having it _after_ monitor looks weird and is probably wrong. [It might > have been the effects of someone seeing the spurious wakeup problems > with realizing the true source,

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 09:36 AM, Peter Zijlstra wrote: > On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: >> That said, I would find it very strange indeed if a CLFLUSH doesn't also >> flush the store buffer. > > OK, it explicitly states it does not do that and you indeed need an > mfence be

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Peter Zijlstra
On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: > On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote: > > Likewise, having a barrier before the MONITOR looks sensible as well. > > I again have to disagree, one would expect monitor to flush all that is > required to start

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Peter Zijlstra
On Thu, Dec 19, 2013 at 06:25:35PM +0100, Peter Zijlstra wrote: > That said, I would find it very strange indeed if a CLFLUSH doesn't also > flush the store buffer. OK, it explicitly states it does not do that and you indeed need an mfence before the clflush. -- To unsubscribe from this list: se

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Peter Zijlstra
On Thu, Dec 19, 2013 at 06:07:41PM +0100, Ingo Molnar wrote: > > * H. Peter Anvin wrote: > > > On 12/19/2013 08:21 AM, Peter Zijlstra wrote: > > > > > > What's that mb for? > > > > > > > It already exists in mwait_idle_with_hints(); I just moved it into > > this common function. It is a bit

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > On 12/19/2013 08:21 AM, Peter Zijlstra wrote: > > > > What's that mb for? > > > > It already exists in mwait_idle_with_hints(); I just moved it into > this common function. It is a bit odd, I have to admit; it seems > like it should be *before* the monitor (and po

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 08:21 AM, Peter Zijlstra wrote: > > What's that mb for? > It already exists in mwait_idle_with_hints(); I just moved it into this common function. It is a bit odd, I have to admit; it seems like it should be *before* the monitor (and possibly we should have one after the CLFLUSH a

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
How does this look? Completely untested, of course. I do wonder if we need more memory barriers, though. An alternative would be to move everything into mwait_idle_with_hints(). -hpa diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7b034a4057f9..6d

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Peter Zijlstra
On Thu, Dec 19, 2013 at 08:13:21AM -0800, H. Peter Anvin wrote: > How does this look? Completely untested, of course. > > I do wonder if we need more memory barriers, though. > > An alternative would be to move everything into mwait_idle_with_hints(). > > -hpa > > diff --git a/arch/x86/

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 08:02 AM, Ingo Molnar wrote: >> >> __monitor() currently doesn't, which is idiotic. > > Hm, __monitor() seems to take a void *: > > arch/x86/include/asm/processor.h:static inline void __monitor(const void > *eax, unsigned long ecx, > > So writing: > > if (this_cpu_

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* H. Peter Anvin wrote: > On 12/19/2013 04:22 AM, Ingo Molnar wrote: > >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > >> index 92d1206..f80b700 100644 > >> --- a/drivers/idle/intel_idle.c > >> +++ b/drivers/idle/intel_idle.c > >> @@ -377,6 +377,9 @@ static int intel_idle

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
On 12/19/2013 04:22 AM, Ingo Molnar wrote: >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c >> index 92d1206..f80b700 100644 >> --- a/drivers/idle/intel_idle.c >> +++ b/drivers/idle/intel_idle.c >> @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, >> >>

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Borislav Petkov
On Thu, Dec 19, 2013 at 06:40:41AM -0800, H. Peter Anvin wrote: > ... or just use static_cpu_has() maybe? Right, if we can get the compiler to generate the shortest CLFLUSH of 3 bytes with register indirect addressing for the operand, then stomping over those 3 bytes with the alternatives would be

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread H. Peter Anvin
... or just use static_cpu_has() maybe? Ingo Molnar wrote: > >* Len Brown wrote: > >> From: Len Brown >> >> Linux 3.10 changed the timing of how thread_info->flags is touched: >> >> x86: Use generic idle loop >> (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) >> >> This caused Intel NHM

Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-19 Thread Ingo Molnar
* Len Brown wrote: > From: Len Brown > > Linux 3.10 changed the timing of how thread_info->flags is touched: > > x86: Use generic idle loop > (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) > > This caused Intel NHM-EX and WSM-EX servers to experience a large number > of immediate MON

[PATCH] x86 idle: repair large-server 50-watt idle-power regression

2013-12-18 Thread Len Brown
From: Len Brown Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caus