Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Davidlohr Bueso
On Sat, 2014-03-22 at 07:57 +0530, Srikar Dronamraju wrote: > > > So reverting and applying v3 3/4 and 4/4 patches works for me. > > > > Ok, I verified that the above endds up resulting in the same tree as > > the minimal patch I sent out, modulo (a) some comments and (b) an > > #ifdef CONFIG_SMP

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-21 Thread Srikar Dronamraju
> > So reverting and applying v3 3/4 and 4/4 patches works for me. > > Ok, I verified that the above endds up resulting in the same tree as > the minimal patch I sent out, modulo (a) some comments and (b) an > #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter. > > So I committed the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 9:55 PM, Srikar Dronamraju wrote: > > I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in > the above url. The last one had a reject but it was pretty > straightforward to resolve it. After this, specjbb completes. > > So reverting and applying v3 3/4 and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
> > Ok, so a big reason why this patch doesn't apply cleanly after reverting > is because *most* of the changes were done at the top of the file with > regards to documenting the ordering guarantees, the actual code changes > are quite minimal. > > I reverted commits 99b60ce6 (documentation) and

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 1:20 PM, Davidlohr Bueso wrote: > > I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending > commit), and then I cleanly applied the equivalent ones from v3 of the > series (which was already *tested* and ready for upstream until you > suggested looking int

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Benjamin Herrenschmidt
On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote: > hmmm looking at ppc spinlock code, it seems that it doesn't have ticket > spinlocks -- in fact Torsten Duwe has been trying to get them upstream > very recently. Since we rely on the counter for detecting waiters, this > might explain the

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 12:25 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso wrote: > > > > Oh, it does. This atomics technique was tested at a customer's site and > > ready for upstream. > > I'm not worried about the *original* patch. I'm worried about the > incre

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso wrote: > > Oh, it does. This atomics technique was tested at a customer's site and > ready for upstream. I'm not worried about the *original* patch. I'm worried about the incremental one. Your original patch never applied to my tree - I think it

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:36 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > > > > Comparing with the patch I sent earlier this morning, looks equivalent, > > and fwiw, passes my initial qemu bootup, which is the first way of > > detecting anything stupid g

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > > Comparing with the patch I sent earlier this morning, looks equivalent, > and fwiw, passes my initial qemu bootup, which is the first way of > detecting anything stupid going on. > > So, Srikar, please try this patch out, as opposed to m

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 11:03 AM, Davidlohr Bueso wrote: > > I still wonder about ppc and spinlocks (no ticketing!!) ... sure the > "waiters" patch might fix the problem just because we explicitly count > the members of the plist. And I guess if we cannot rely on all archs > having an equivalent s

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 10:42 -0700, Linus Torvalds wrote: > On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: > >> It strikes me that the "spin_is_locked()" test has no barriers wrt the > >> writing of the new futex value on the wake path. And the read barrier > >> obviously does nothing wrt

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso wrote: >> It strikes me that the "spin_is_locked()" test has no barriers wrt the >> writing of the new futex value on the wake path. And the read barrier >> obviously does nothing wrt the write either. Or am I missing >> something? So the write tha

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 09:41 -0700, Linus Torvalds wrote: > On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso wrote: > > > > This problem suggests that we missed a wakeup for a task that was adding > > itself to the queue in a wait path. And the only place that can happen > > is with the hb spinloc

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso wrote: > > This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Ok, so thinking about hb_waiters_

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote: > On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > > Joy,.. let me look at that with ppc in mind. > > > > > > OK; so while pretty much all the comments from that patch are utter > > > nonsense (what was I thinking), I canno

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 15:38 +0530, Srikar Dronamraju wrote: > > This problem suggests that we missed a wakeup for a task that was adding > > itself to the queue in a wait path. And the only place that can happen > > is with the hb spinlock check for any pending waiters. Just in case we > > missed s

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Srikar Dronamraju
> This problem suggests that we missed a wakeup for a task that was adding > itself to the queue in a wait path. And the only place that can happen > is with the hb spinlock check for any pending waiters. Just in case we > missed some assumption about checking the hash bucket spinlock as a way > of

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-20 Thread Peter Zijlstra
On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the be

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote: > > > Joy,.. let me look at that with ppc in mind. > > > > OK; so while pretty much all the comments from that patch are utter > > nonsense (what was I thinking), I cannot actually find a real bug. > > > > But could you try the below whi

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
> > Joy,.. let me look at that with ppc in mind. > > OK; so while pretty much all the comments from that patch are utter > nonsense (what was I thinking), I cannot actually find a real bug. > > But could you try the below which replaces a control dependency with a > full barrier. The control flow

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Davidlohr Bueso
On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote: > On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: > > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 > > > and confirmed that > > > reverting the commit solved the problem. > > > > Joy,.. let me lo

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote: > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and > > confirmed that > > reverting the commit solved the problem. > > Joy,.. let me look at that with ppc in mind. OK; so while pretty much all the commen

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Linus Torvalds
On Wed, Mar 19, 2014 at 8:26 AM, Srikar Dronamraju wrote: > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and > confirmed that > reverting the commit solved the problem. Ok. I'll give Peter and Davidlohr a few days to perhaps find something obvious, but I guess we'll

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Srikar Dronamraju
> > > > Infact I can reproduce this if the java_constraint is either node, socket, > > system. > > However I am not able to reproduce if java_constraint is set to core. > > What's any of that mean? > Using the constraint, one can specify how many jvm instances should participate in the specjbb

Re: Tasks stuck in futex code (in 3.14-rc6)

2014-03-19 Thread Peter Zijlstra
On Wed, Mar 19, 2014 at 08:56:19PM +0530, Srikar Dronamraju wrote: > There are 332 tasks all stuck in futex_wait_queue_me(). > I am able to reproduce this consistently. > > Infact I can reproduce this if the java_constraint is either node, socket, > system. > However I am not able to reproduce if