Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-17 Thread Thomas Gleixner
On Wed, 17 Dec 2014, Preeti Murthy wrote: > On Tue, Dec 16, 2014 at 6:19 PM, Thomas Gleixner wrote: > > So the possible states are: > > > > ts->inidle ts->tick_stopped > > 0 0 valid > > 0 1 BUG > > 1 0 valid >

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-17 Thread Frederic Weisbecker
On Wed, Dec 17, 2014 at 10:11:58AM +0100, Thomas Gleixner wrote: > On Wed, 17 Dec 2014, Frederic Weisbecker wrote: > > On Tue, Dec 16, 2014 at 10:21:27PM +0100, Thomas Gleixner wrote: > > > So instead of evaluating the whole nonsense a gazillion times in a row > > > and firing pointless self ipis w

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-17 Thread Preeti Murthy
Hi Thomas, On Tue, Dec 16, 2014 at 6:19 PM, Thomas Gleixner wrote: > On Tue, 16 Dec 2014, Preeti U Murthy wrote: >> As far as I can see, the primary purpose of tick_nohz_irq_enter()/exit() >> paths was to take care of *tick stopped* cases. >> >> Before handling interrupts we would want jiffies to

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-17 Thread Thomas Gleixner
On Wed, 17 Dec 2014, Frederic Weisbecker wrote: > On Tue, Dec 16, 2014 at 10:21:27PM +0100, Thomas Gleixner wrote: > > So instead of evaluating the whole nonsense a gazillion times in a row > > and firing pointless self ipis why are you not looking at the obvious > > solution of sane state change t

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Frederic Weisbecker
On Tue, Dec 16, 2014 at 11:54:51PM +0100, Thomas Gleixner wrote: > > But yes, that should work just fine.. > > So I'm not the only one who thinks that this needs a proper > reimplementation :) Besides, if this works, we'll have way less IPIs, and people usually like that :-) -- To unsubscribe fro

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Frederic Weisbecker
On Tue, Dec 16, 2014 at 10:21:27PM +0100, Thomas Gleixner wrote: > Now lets look at the call site of tick_nohz_task_switch(). That's > invoked at the end of finish_task_switch(). Looking at one of the > worst case call chains here: > > finish_task_switch() > tick_nohz_task_switch() > __tick_

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Peter Zijlstra wrote: > On Tue, Dec 16, 2014 at 10:21:27PM +0100, Thomas Gleixner wrote: > > /* rq->lock is held for evaluating rq->nr_running */ > > static void sched_ttwu_remote_nohz(struct rq *rq) > > { > > if (nohz_full_disabled()) > > return; > > > > i

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Peter Zijlstra
On Tue, Dec 16, 2014 at 10:21:27PM +0100, Thomas Gleixner wrote: > DEFINE_PER_CPU(nohz_full_must_tick, unsigned long); > > enum { > NOHZ_SCHED_NEEDS_TICK, > NOHZ_POSIXTIMER_NEEDS_TICK, > NOHZ_PERF_NEEEDS_TICK, > }; > > /* rq->lock is held for evaluating rq->nr_running */ > static v

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Thomas Gleixner wrote: > On Tue, 16 Dec 2014, Frederic Weisbecker wrote: > So like we do in tick_nohz_idle_enter() and tick_nohz_idle_exit() we > have a clear state change in the nohz code and not something which is > randomly deduced from async state all over the place. > > S

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Jacob Pan wrote: > On Tue, 16 Dec 2014 09:48:42 +0530 > Viresh Kumar wrote: > > I really don't know what stuff out of the two patches I posted (The > > above one and the fix I posted yesterday), will possible make the > > synchronization bad .. > > > But since your patch has

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Jacob Pan
On Tue, 16 Dec 2014 09:48:42 +0530 Viresh Kumar wrote: > On 16 December 2014 at 02:54, Pan, Jacob jun > wrote: > > > Looks good to me. You can add my Reviewed-by to the above patch. > > Thanks. > > > I have tested this fix and confirm powerclamp is working properly > > now. > > Oh, nice. >

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Peter Zijlstra wrote: > On Tue, Dec 16, 2014 at 03:32:28PM +0100, Thomas Gleixner wrote: > So let me try and understand the problem with the emulated idle thing > better (running idle from FIFO threads). > > I suppose the tricky bit is what happens when the cpu was idle; in th

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Peter Zijlstra
On Tue, Dec 16, 2014 at 03:32:28PM +0100, Thomas Gleixner wrote: > @@ -4997,6 +5025,8 @@ pick_next_task_fair(struct rq *rq, struct task_struct > *prev) > struct task_struct *p; > int new_tasks; > > + if (class_fair_disabled()) > + goto idle; We don't want to do new i

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Frederic Weisbecker wrote: > On Tue, Dec 16, 2014 at 01:49:03PM +0100, Thomas Gleixner wrote: > > And that's where the whole problem starts. The nohz full stuff is > > trying to evaluate everything dynamically which is just insane. > > > > So we want to have functions which do

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Thomas Gleixner wrote: > Now the powerclamp mess is a different story. > > Calling tick_nohz_idle_enter()/exit() from outside the idle task is > just broken. Period. > > Trying to work around that madness in the core code is just fiddling > with the symptoms and ignoring the

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Frederic Weisbecker
On Tue, Dec 16, 2014 at 01:49:03PM +0100, Thomas Gleixner wrote: > And that's where the whole problem starts. The nohz full stuff is > trying to evaluate everything dynamically which is just insane. > > So we want to have functions which do: > >tick_nohz_full_enter() > ts->infullnohz = t

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Thomas Gleixner
On Tue, 16 Dec 2014, Preeti U Murthy wrote: > As far as I can see, the primary purpose of tick_nohz_irq_enter()/exit() > paths was to take care of *tick stopped* cases. > > Before handling interrupts we would want jiffies to be updated, which is > done in tick_nohz_irq_enter(). And after handling

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-16 Thread Preeti U Murthy
On 12/16/2014 10:23 AM, Viresh Kumar wrote: > + Peter from Jacob's mail .. > > On 16 December 2014 at 05:14, Frederic Weisbecker wrote: >> So to summarize: I see it enqueues a timer then it loops on that timer >> expiration. >> On that loop we stop the CPU and we expect the timer to fire and wak

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Viresh Kumar
+ Peter from Jacob's mail .. On 16 December 2014 at 05:14, Frederic Weisbecker wrote: > So to summarize: I see it enqueues a timer then it loops on that timer > expiration. > On that loop we stop the CPU and we expect the timer to fire and wake the > thread up. > But if the delayed tick fires t

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Viresh Kumar
On 16 December 2014 at 02:54, Pan, Jacob jun wrote: > Looks good to me. You can add my Reviewed-by to the above patch. Thanks. > I have tested this fix and confirm powerclamp is working properly now. Oh, nice. > However, we also have a planned patch for consolidated idle loop. With this > pa

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Frederic Weisbecker
On Mon, Dec 15, 2014 at 03:02:17PM +0530, Viresh Kumar wrote: > On 15 December 2014 at 12:55, Preeti U Murthy > wrote: > > Hi Viresh, > > > > Let me explain why I think this is happening. > > > > 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle* > > and receives an interrupt

RE: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Pan, Jacob jun
-Original Message- From: Preeti U Murthy [mailto:pre...@linux.vnet.ibm.com] Sent: Monday, December 15, 2014 1:44 AM To: Viresh Kumar; Thomas Gleixner; Wu, Fengguang Cc: Frederic Weisbecker; Pan, Jacob jun; LKML; LKP Subject: Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection On 12

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Preeti U Murthy
On 12/15/2014 03:02 PM, Viresh Kumar wrote: > On 15 December 2014 at 12:55, Preeti U Murthy > wrote: >> Hi Viresh, >> >> Let me explain why I think this is happening. >> >> 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle* >> and receives an interrupt. > > Bang on target. Y

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-15 Thread Viresh Kumar
On 15 December 2014 at 12:55, Preeti U Murthy wrote: > Hi Viresh, > > Let me explain why I think this is happening. > > 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle* > and receives an interrupt. Bang on target. Yeah that's the part we missed while writing this patch :)

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-14 Thread Preeti U Murthy
Hi Viresh, Let me explain why I think this is happening. 1. tick_nohz_irq_enter/exit() both get called *only if the cpu is idle* and receives an interrupt. 2. Commit 2a16fc93d2c9568e1, cancels programming of tick_sched timer in its handler, assuming that tick_nohz_irq_exit() will take care of pr

Re: [nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-12 Thread Viresh Kumar
Cc'ing Thomas as well.. On 12 December 2014 at 01:12, Fengguang Wu wrote: > Hi Viresh, > > We noticed the below lockup regression on commit 2a16fc93d2c ("nohz: > Avoid tick's double reprogramming in highres mode"). > > testbox/testcase/testparams: ivb42/idle-inject/60s-200%-10cp > > b5e995e671d8e

[nohz] 2a16fc93d2c: kernel lockup on idle injection

2014-12-11 Thread Fengguang Wu
Hi Viresh, We noticed the below lockup regression on commit 2a16fc93d2c ("nohz: Avoid tick's double reprogramming in highres mode"). testbox/testcase/testparams: ivb42/idle-inject/60s-200%-10cp b5e995e671d8e4d7 2a16fc93d2c9568e16d45db77c -- f