Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 01:02 PM, Sasha Levin wrote: > I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [ 22

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:55 AM, Davidlohr Bueso wrote: >> >> - look up the vma in the vma lookup cache > > But you'd still need mmap_sem there to at least get the VMA's first > value. So my theory was that the vma cache is such a trivial data structure that we could trivially make it be rcu-pro

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:55 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > > > IOW, I wonder if we could special-case the common non-IO > > fault-handling path something along the lines of: > > > > - look up the vma in the vma lookup cache > > But you

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > IOW, I wonder if we could special-case the common non-IO > fault-handling path something along the lines of: > > - look up the vma in the vma lookup cache But you'd still need mmap_sem there to at least get the VMA's first value. > -

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:32 -0800, Linus Torvalds wrote: > Basically, to me, the whole "if a lock is so contended that we need to > play locking games, then we should look at why we *use* the lock, > rather than at the lock itself" is a religion. Oh absolutely, I'm only mentioning the locking prim

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso wrote: > > I obviously agree with all those points, however fyi most of the testing > on rwsems I do includes scaling address space ops stressing the > mmap_sem, which is a real world concern. So while it does include > microbenchmarks, it is not gu

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ onl

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ onl

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Linus Torvalds
On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > Right, the can_spin_on_owner() was originally added to the mutex > spinning code for optimization purposes, particularly so that we can > avoid adding the spinner to the OSQ only to find that it doesn't need to > spin. This function needing to

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Jason Low
On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > > * Sasha Levin wrote: > > > > > I've bisected this to "locking/rwsem: Check for active lock before > > > bailing on spinning". Relevant parties Cc'ed. > > > > That would be: > >

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 12:19 PM, Davidlohr Bueso wrote: >> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c >> > index 1c0d11e8ce34..e4ad019e23f5 100644 >> > --- a/kernel/locking/rwsem-xadd.c >> > +++ b/kernel/locking/rwsem-xadd.c >> > @@ -298,23 +298,30 @@ static inline bool >> > rws

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Davidlohr Bueso
On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote: > * Sasha Levin wrote: > > > I've bisected this to "locking/rwsem: Check for active lock before bailing > > on spinning". Relevant parties Cc'ed. > > That would be: > > 1a99367023f6 ("locking/rwsem: Check for active lock before bailing on

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:45 AM, Sasha Levin wrote: > On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: >> Are you sure about this ? I have a core dump locked on the same place >> (state machine for powering cpu down for the task swap) from a 3.13 (+ >> upstream patches) and this commit wasn't backported y

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
On 03/06/2015 09:34 AM, Rafael David Tinoco wrote: > Are you sure about this ? I have a core dump locked on the same place > (state machine for powering cpu down for the task swap) from a 3.13 (+ > upstream patches) and this commit wasn't backported yet. bisect took me to that same commit twice, a

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Rafael David Tinoco
Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. -> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY differ

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Ingo Molnar
* Sasha Levin wrote: > I've bisected this to "locking/rwsem: Check for active lock before bailing on > spinning". Relevant parties Cc'ed. That would be: 1a99367023f6 ("locking/rwsem: Check for active lock before bailing on spinning") attached below. Thanks, Ingo ===

Re: sched: softlockups in multi_cpu_stop

2015-03-06 Thread Sasha Levin
I've bisected this to "locking/rwsem: Check for active lock before bailing on spinning". Relevant parties Cc'ed. Thanks, Sasha On 03/02/2015 02:45 AM, Sasha Levin wrote: > Hi all, > > I'm seeing the following lockup pretty often while fuzzing with trinity: > > [ 880.960250] NMI watchdog: BUG:

Re: sched: softlockups in multi_cpu_stop

2015-03-03 Thread Rafael David Tinoco
Some more info: multi_cpu_stop seems to be spinning inside do { ... } while (curstate != MULTI_STOP_EXIT); So, multi_cpu_stop is an offload ([migration]) for: migrate_swap -> stop_two_cpus -> wait_for_completion() sequence... for cross-migrating 2 tasks. Based on task structs from callers stacks

sched: softlockups in multi_cpu_stop

2015-03-01 Thread Sasha Levin
Hi all, I'm seeing the following lockup pretty often while fuzzing with trinity: [ 880.960250] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 447s! [migration/1:14] [ 880.960700] Modules linked in: [ 880.960700] irq event stamp: 380954 [ 880.960700] hardirqs last enabled at (380953): resto