Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-10 Thread Michael Wang
On 07/10/2013 10:40 AM, Michael Wang wrote: > On 07/09/2013 07:51 PM, Bartlomiej Zolnierkiewicz wrote: > [snip] >> >> It doesn't help and unfortunately it just can't help as it only >> addresses lockdep functionality while the issue is not a lockdep >> problem but a genuine locking problem. CPU hot

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Viresh Kumar
On 10 July 2013 11:34, Michael Wang wrote: > Thanks for the confirm :) seems like the root cause is very likely > related with the problem Srivatsa discovered. > > I think the fix in his mail worth a try, but I need more investigations > to confirm that's the right way... Its not a fix really, bu

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Michael Wang
On 07/10/2013 01:39 PM, Viresh Kumar wrote: > On 10 July 2013 09:42, Michael Wang wrote: >> I'm not sure what is supposed after notify CPUFREQ_GOV_STOP event, if it >> is in order to stop queued work and prevent follow work happen again, >> then it failed to, and we need some method to stop queue

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Viresh Kumar
On 10 July 2013 09:42, Michael Wang wrote: > I'm not sure what is supposed after notify CPUFREQ_GOV_STOP event, if it > is in order to stop queued work and prevent follow work happen again, > then it failed to, and we need some method to stop queue work again when > CPUFREQ_GOV_STOP notified, like

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Michael Wang
On 07/09/2013 09:07 PM, Srivatsa S. Bhat wrote: [snip] > > But this still doesn't immediately explain how we can end up trying to > queue work items on offline CPUs (since policy->cpus is supposed to always > contain online cpus only, and this does look correct in the code as well, > at a first gl

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Michael Wang
On 07/09/2013 09:07 PM, Srivatsa S. Bhat wrote: [snip] >> > > Yeah, exactly! > > So I had proposed doing an asynchronous cancel-work or doing the > synchronous cancel-work in the CPU_POST_DEAD phase, where the > cpu_hotplug.lock is not held. See this thread: > > http://marc.info/?l=linux-kernel&

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Michael Wang
On 07/09/2013 07:51 PM, Bartlomiej Zolnierkiewicz wrote: [snip] > > It doesn't help and unfortunately it just can't help as it only > addresses lockdep functionality while the issue is not a lockdep > problem but a genuine locking problem. CPU hot-unplug invokes > _cpu_down() which calls cpu_hotpl

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Srivatsa S. Bhat
On 07/09/2013 05:21 PM, Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Tuesday, July 09, 2013 10:15:43 AM Michael Wang wrote: >> Hi, Bartlomiej >> >> On 07/08/2013 11:26 PM, Bartlomiej Zolnierkiewicz wrote: >> [snip] >>> >>> # echo 0 > /sys/devices/system/cpu/cpu3/online >>> # echo 0 > /sys/devi

Re: Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-09 Thread Bartlomiej Zolnierkiewicz
Hi, On Tuesday, July 09, 2013 10:15:43 AM Michael Wang wrote: > Hi, Bartlomiej > > On 07/08/2013 11:26 PM, Bartlomiej Zolnierkiewicz wrote: > [snip] > > > > # echo 0 > /sys/devices/system/cpu/cpu3/online > > # echo 0 > /sys/devices/system/cpu/cpu2/online > > # echo 0 > /sys/devices/system/cpu/c

Re: [v3.10 regression] deadlock on cpu hotplug

2013-07-08 Thread Michael Wang
Hi, Bartlomiej On 07/08/2013 11:26 PM, Bartlomiej Zolnierkiewicz wrote: [snip] > > # echo 0 > /sys/devices/system/cpu/cpu3/online > # echo 0 > /sys/devices/system/cpu/cpu2/online > # echo 0 > /sys/devices/system/cpu/cpu1/online > # while true;do echo 1 > /sys/devices/system/cpu/cpu1/online;echo 0

[v3.10 regression] deadlock on cpu hotplug

2013-07-08 Thread Bartlomiej Zolnierkiewicz
Hi, Commit 2f7021a8 ("cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()") causes the following deadlock for me when using kernel v3.10 on ARM EXYNOS4412: [ 960.38] INFO: task kworker/0:1:34 blocked for more than 120 seconds. [ 960.385000] "echo 0 > /proc/sys/kernel/h