Hi, Commit 2f7021a8 ("cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()") causes the following deadlock for me when using kernel v3.10 on ARM EXYNOS4412:
[ 960.380000] INFO: task kworker/0:1:34 blocked for more than 120 seconds. [ 960.385000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 960.390000] kworker/0:1 D c0613d88 0 34 2 0x00000000 [ 960.395000] Workqueue: events od_dbs_timer [ 960.400000] [<c0613d88>] (__schedule+0x39c/0x7ec) from [<c06145e8>] (schedule_preempt_disabled+0x24/0x34) [ 960.410000] [<c06145e8>] (schedule_preempt_disabled+0x24/0x34) from [<c0612fe4>] (__mutex_lock_slowpath+0x15c/0x21c) [ 960.420000] [<c0612fe4>] (__mutex_lock_slowpath+0x15c/0x21c) from [<c06130ec>] (mutex_lock+0x48/0x4c) [ 960.430000] [<c06130ec>] (mutex_lock+0x48/0x4c) from [<c0027cdc>] (get_online_cpus+0x2c/0x48) [ 960.440000] [<c0027cdc>] (get_online_cpus+0x2c/0x48) from [<c03b059c>] (gov_queue_work+0x1c/0xb8) [ 960.450000] [<c03b059c>] (gov_queue_work+0x1c/0xb8) from [<c03afabc>] (od_dbs_timer+0xa8/0x12c) [ 960.455000] [<c03afabc>] (od_dbs_timer+0xa8/0x12c) from [<c003ecdc>] (process_one_work+0x138/0x43c) [ 960.465000] [<c003ecdc>] (process_one_work+0x138/0x43c) from [<c003f3d8>] (worker_thread+0x134/0x3b8) [ 960.475000] [<c003f3d8>] (worker_thread+0x134/0x3b8) from [<c0044df8>] (kthread+0xa4/0xb0) [ 960.485000] [<c0044df8>] (kthread+0xa4/0xb0) from [<c000ead8>] (ret_from_fork+0x14/0x3c) [ 960.490000] INFO: task bash:2497 blocked for more than 120 seconds. [ 960.495000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 960.505000] bash D c0613d88 0 2497 2496 0x00000001 [ 960.510000] [<c0613d88>] (__schedule+0x39c/0x7ec) from [<c061244c>] (schedule_timeout+0x14c/0x20c) [ 960.520000] [<c061244c>] (schedule_timeout+0x14c/0x20c) from [<c0613774>] (wait_for_common+0xac/0x150) [ 960.530000] [<c0613774>] (wait_for_common+0xac/0x150) from [<c003deb8>] (flush_work+0xa4/0x130) [ 960.540000] [<c003deb8>] (flush_work+0xa4/0x130) from [<c003f9d4>] (__cancel_work_timer+0x74/0x100) [ 960.545000] [<c003f9d4>] (__cancel_work_timer+0x74/0x100) from [<c03b0dc8>] (cpufreq_governor_dbs+0x4dc/0x630) [ 960.555000] [<c03b0dc8>] (cpufreq_governor_dbs+0x4dc/0x630) from [<c03ac70c>] (__cpufreq_governor+0x60/0x120) [ 960.565000] [<c03ac70c>] (__cpufreq_governor+0x60/0x120) from [<c03ad1cc>] (cpufreq_add_dev+0x2f4/0x4d8) [ 960.575000] [<c03ad1cc>] (cpufreq_add_dev+0x2f4/0x4d8) from [<c060dcd8>] (cpufreq_cpu_callback+0x54/0x5c) [ 960.585000] [<c060dcd8>] (cpufreq_cpu_callback+0x54/0x5c) from [<c004a280>] (notifier_call_chain+0x44/0x84) [ 960.595000] [<c004a280>] (notifier_call_chain+0x44/0x84) from [<c0027c94>] (__cpu_notify+0x2c/0x48) [ 960.605000] [<c0027c94>] (__cpu_notify+0x2c/0x48) from [<c060b914>] (_cpu_up+0x10c/0x15c) [ 960.615000] [<c060b914>] (_cpu_up+0x10c/0x15c) from [<c060b9c8>] (cpu_up+0x64/0x80) [ 960.620000] [<c060b9c8>] (cpu_up+0x64/0x80) from [<c060a704>] (store_online+0x4c/0x74) [ 960.630000] [<c060a704>] (store_online+0x4c/0x74) from [<c02b88b4>] (dev_attr_store+0x18/0x24) [ 960.635000] [<c02b88b4>] (dev_attr_store+0x18/0x24) from [<c014f43c>] (sysfs_write_file+0xfc/0x164) [ 960.645000] [<c014f43c>] (sysfs_write_file+0xfc/0x164) from [<c00f6e70>] (vfs_write+0xbc/0x184) [ 960.655000] [<c00f6e70>] (vfs_write+0xbc/0x184) from [<c00f7208>] (SyS_write+0x40/0x68) [ 960.665000] [<c00f7208>] (SyS_write+0x40/0x68) from [<c000ea40>] (ret_fast_syscall+0x0/0x30) Tomek has also managed to got a lockdep info for the problem: [ 36.570161] [ 36.570233] ====================================================== [ 36.576376] [ INFO: possible circular locking dependency detected ] [ 36.582637] 3.10.0-rc5-00336-g6905065-dirty #1069 Not tainted [ 36.588355] ------------------------------------------------------- [ 36.594608] bash/2783 is trying to acquire lock: [ 36.599205] ((&(&j_cdbs->work)->work)){+.+.+.}, at: [<c0042138>] flush_work+0x0/0x27c [ 36.607104] [ 36.607104] but task is already holding lock: [ 36.612917] (cpu_hotplug.lock){+.+.+.}, at: [<c0028cb0>] cpu_hotplug_begin+0x2c/0x58 [ 36.620731] [ 36.620731] which lock already depends on the new lock. [ 36.620731] [ 36.628890] [ 36.628890] the existing dependency chain (in reverse order) is: [ 36.636355] -> #2 (cpu_hotplug.lock){+.+.+.}: [ 36.640867] [<c007c828>] lock_acquire+0x9c/0x130 [ 36.646075] [<c0643dc4>] mutex_lock_nested+0x68/0x3ec [ 36.651716] [<c0028db0>] get_online_cpus+0x40/0x60 [ 36.657097] [<c03c2810>] gov_queue_work+0x1c/0xac [ 36.662392] [<c03c18c4>] od_dbs_timer+0xb8/0x16c [ 36.667600] [<c00415f8>] process_one_work+0x198/0x4d8 [ 36.673242] [<c0042998>] worker_thread+0x134/0x3e8 [ 36.678624] [<c0048ad8>] kthread+0xa8/0xb4 [ 36.683311] [<c000ee88>] ret_from_fork+0x14/0x2c [ 36.688522] -> #1 (&j_cdbs->timer_mutex){+.+.+.}: [ 36.693380] [<c007c828>] lock_acquire+0x9c/0x130 [ 36.698588] [<c0643dc4>] mutex_lock_nested+0x68/0x3ec [ 36.704229] [<c03c184c>] od_dbs_timer+0x40/0x16c [ 36.709438] [<c00415f8>] process_one_work+0x198/0x4d8 [ 36.715080] [<c0042998>] worker_thread+0x134/0x3e8 [ 36.720461] [<c0048ad8>] kthread+0xa8/0xb4 [ 36.725148] [<c000ee88>] ret_from_fork+0x14/0x2c [ 36.730360] -> #0 ((&(&j_cdbs->work)->work)){+.+.+.}: [ 36.735565] [<c007bb70>] __lock_acquire+0x17c0/0x1e64 [ 36.741206] [<c007c828>] lock_acquire+0x9c/0x130 [ 36.746414] [<c0042170>] flush_work+0x38/0x27c [ 36.751449] [<c00424e8>] __cancel_work_timer+0x84/0x120 [ 36.757265] [<c03c3040>] cpufreq_governor_dbs+0x4f4/0x674 [ 36.763254] [<c03bdeac>] __cpufreq_governor+0x60/0x120 [ 36.768983] [<c03bfaa0>] __cpufreq_remove_dev.clone.4+0x7c/0x480 [ 36.775579] [<c06421d0>] cpufreq_cpu_callback+0x48/0x5c [ 36.781396] [<c004e688>] notifier_call_chain+0x44/0x84 [ 36.787124] [<c0028c30>] __cpu_notify+0x2c/0x48 [ 36.792245] [<c063de9c>] _cpu_down+0xb4/0x238 [ 36.797193] [<c063e048>] cpu_down+0x28/0x3c [ 36.801966] [<c063e9c0>] store_online+0x30/0x74 [ 36.807088] [<c02c9bbc>] dev_attr_store+0x18/0x24 [ 36.812382] [<c0166e08>] sysfs_write_file+0x104/0x18c [ 36.818025] [<c010b704>] vfs_write+0xc8/0x194 [ 36.822973] [<c010bac0>] SyS_write+0x44/0x70 [ 36.827833] [<c000edc0>] ret_fast_syscall+0x0/0x48 [ 36.833218] [ 36.833218] other info that might help us debug this: [ 36.833218] [ 36.841203] Chain exists of: (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock [ 36.850663] Possible unsafe locking scenario: [ 36.850663] [ 36.856565] CPU0 CPU1 [ 36.861079] ---- ---- [ 36.865591] lock(cpu_hotplug.lock); [ 36.869237] lock(&j_cdbs->timer_mutex); [ 36.875746] lock(cpu_hotplug.lock); [ 36.881909] lock((&(&j_cdbs->work)->work)); [ 36.886250] [ 36.886250] *** DEADLOCK *** [ 36.886250] [ 36.892163] 5 locks held by bash/2783: [ 36.895886] #0: (sb_writers#7){.+.+.+}, at: [<c010b7cc>] vfs_write+0x190/0x194 [ 36.903264] #1: (&buffer->mutex){+.+.+.}, at: [<c0166d2c>] sysfs_write_file+0x28/0x18c [ 36.911336] #2: (s_active#50){.+.+..}, at: [<c0166de4>] sysfs_write_file+0xe0/0x18c [ 36.919148] #3: (cpu_add_remove_lock){+.+.+.}, at: [<c063e02c>] cpu_down+0xc/0x3c [ 36.926787] #4: (cpu_hotplug.lock){+.+.+.}, at: [<c0028cb0>] cpu_hotplug_begin+0x2c/0x58 [ 36.935032] [ 36.935032] stack backtrace: [ 36.939394] CPU: 1 PID: 2783 Comm: bash Not tainted 3.10.0-rc5-00336-g6905065-dirty #1069 [ 36.947591] [<c0015ed0>] (unwind_backtrace+0x0/0x13c) from [<c0012e58>] (show_stack+0x10/0x14) [ 36.956164] [<c0012e58>] (show_stack+0x10/0x14) from [<c0078a9c>] (print_circular_bug+0x1c8/0x304) [ 36.965105] [<c0078a9c>] (print_circular_bug+0x1c8/0x304) from [<c007bb70>] (__lock_acquire+0x17c0/0x1e64) [ 36.974735] [<c007bb70>] (__lock_acquire+0x17c0/0x1e64) from [<c007c828>] (lock_acquire+0x9c/0x130) [ 36.983767] [<c007c828>] (lock_acquire+0x9c/0x130) from [<c0042170>] (flush_work+0x38/0x27c) [ 36.992187] [<c0042170>] (flush_work+0x38/0x27c) from [<c00424e8>] (__cancel_work_timer+0x84/0x120) [ 37.001225] [<c00424e8>] (__cancel_work_timer+0x84/0x120) from [<c03c3040>] (cpufreq_governor_dbs+0x4f4/0x674) [ 37.011196] [<c03c3040>] (cpufreq_governor_dbs+0x4f4/0x674) from [<c03bdeac>] (__cpufreq_governor+0x60/0x120) [ 37.021087] [<c03bdeac>] (__cpufreq_governor+0x60/0x120) from [<c03bfaa0>] (__cpufreq_remove_dev.clone.4+0x7c/0x480) [ 37.031607] [<c03bfaa0>] (__cpufreq_remove_dev.clone.4+0x7c/0x480) from [<c06421d0>] (cpufreq_cpu_callback+0x48/0x5c) [ 37.042207] [<c06421d0>] (cpufreq_cpu_callback+0x48/0x5c) from [<c004e688>] (notifier_call_chain+0x44/0x84) [ 37.051910] [<c004e688>] (notifier_call_chain+0x44/0x84) from [<c0028c30>] (__cpu_notify+0x2c/0x48) [ 37.060931] [<c0028c30>] (__cpu_notify+0x2c/0x48) from [<c063de9c>] (_cpu_down+0xb4/0x238) [ 37.069177] [<c063de9c>] (_cpu_down+0xb4/0x238) from [<c063e048>] (cpu_down+0x28/0x3c) [ 37.077074] [<c063e048>] (cpu_down+0x28/0x3c) from [<c063e9c0>] (store_online+0x30/0x74) [ 37.085147] [<c063e9c0>] (store_online+0x30/0x74) from [<c02c9bbc>] (dev_attr_store+0x18/0x24) [ 37.093740] [<c02c9bbc>] (dev_attr_store+0x18/0x24) from [<c0166e08>] (sysfs_write_file+0x104/0x18c) [ 37.102852] [<c0166e08>] (sysfs_write_file+0x104/0x18c) from [<c010b704>] (vfs_write+0xc8/0x194) [ 37.111616] [<c010b704>] (vfs_write+0xc8/0x194) from [<c010bac0>] (SyS_write+0x44/0x70) [ 37.119613] [<c010bac0>] (SyS_write+0x44/0x70) from [<c000edc0>] (ret_fast_syscall+0x0/0x48) Reproducing the issue is very easy, i.e.: # echo 0 > /sys/devices/system/cpu/cpu3/online # echo 0 > /sys/devices/system/cpu/cpu2/online # echo 0 > /sys/devices/system/cpu/cpu1/online # while true;do echo 1 > /sys/devices/system/cpu/cpu1/online;echo 0 > /sys/devices/system/cpu/cpu1/online;done The commit in question (2f7021a8) was merged in v3.10-rc5 as a fix for commit 031299b ("cpufreq: governors: Avoid unnecessary per cpu timer interrupts") which was causing a kernel warning to show up. Michael/Viresh: do you have some idea how to fix the issue? Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/