Hi, On Wed, May 15, 2019 at 01:15:52PM +0530, Gautham R. Shenoy wrote: > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> > > The calls to arch_add_memory()/arch_remove_memory() are always made > with the read-side cpu_hotplug_lock acquired via > memory_hotplug_begin(). On pSeries, > arch_add_memory()/arch_remove_memory() eventually call resize_hpt() > which in turn calls stop_machine() which acquires the read-side > cpu_hotplug_lock again, thereby resulting in the recursive acquisition > of this lock.
A clarification regarding why we hadn't observed this problem earlier. In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems. However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations. Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 ------ ------ 1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()] > > Lockdep complains as follows in these code-paths. > > swapper/0/1 is trying to acquire lock: > (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: stop_machine+0x2c/0x60 > > but task is already holding lock: > (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: > mem_hotplug_begin+0x20/0x50 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(cpu_hotplug_lock.rw_sem); > lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by swapper/0/1: > #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x12c/0x1b0 > #1: (____ptrval____) (cpu_hotplug_lock.rw_sem){++++}, at: > mem_hotplug_begin+0x20/0x50 > #2: (____ptrval____) (mem_hotplug_lock.rw_sem){++++}, at: > percpu_down_write+0x54/0x1a0 > > stack backtrace: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.0.0-rc5-58373-gbc99402235f3-dirty #166 > Call Trace: > [c0000000feb03150] [c000000000e32bd4] dump_stack+0xe8/0x164 (unreliable) > [c0000000feb031a0] [c00000000020d6c0] __lock_acquire+0x1110/0x1c70 > [c0000000feb03320] [c00000000020f080] lock_acquire+0x240/0x290 > [c0000000feb033e0] [c00000000017f554] cpus_read_lock+0x64/0xf0 > [c0000000feb03420] [c00000000029ebac] stop_machine+0x2c/0x60 > [c0000000feb03460] [c0000000000d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0 > [c0000000feb03500] [c0000000000788d0] resize_hpt_for_hotplug+0x70/0xd0 > [c0000000feb03570] [c000000000e5d278] arch_add_memory+0x58/0xfc > [c0000000feb03610] [c0000000003553a8] devm_memremap_pages+0x5e8/0x8f0 > [c0000000feb036c0] [c0000000009c2394] pmem_attach_disk+0x764/0x830 > [c0000000feb037d0] [c0000000009a7c38] nvdimm_bus_probe+0x118/0x240 > [c0000000feb03860] [c000000000968500] really_probe+0x230/0x4b0 > [c0000000feb038f0] [c000000000968aec] driver_probe_device+0x16c/0x1e0 > [c0000000feb03970] [c000000000968ca8] __driver_attach+0x148/0x1b0 > [c0000000feb039f0] [c0000000009650b0] bus_for_each_dev+0x90/0x130 > [c0000000feb03a50] [c000000000967dd4] driver_attach+0x34/0x50 > [c0000000feb03a70] [c000000000967068] bus_add_driver+0x1a8/0x360 > [c0000000feb03b00] [c00000000096a498] driver_register+0x108/0x170 > [c0000000feb03b70] [c0000000009a7400] __nd_driver_register+0xd0/0xf0 > [c0000000feb03bd0] [c00000000128aa90] nd_pmem_driver_init+0x34/0x48 > [c0000000feb03bf0] [c000000000010a10] do_one_initcall+0x1e0/0x45c > [c0000000feb03cd0] [c00000000122462c] kernel_init_freeable+0x540/0x64c > [c0000000feb03db0] [c00000000001110c] kernel_init+0x2c/0x160 > [c0000000feb03e20] [c00000000000bed4] ret_from_kernel_thread+0x5c/0x68 > > Fix this issue by > 1) Requiring all the calls to pseries_lpar_resize_hpt() be made > with cpu_hotplug_lock held. > > 2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() > as a consequence of 1) > > 3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() > with cpu_hotplug_lock held. > > Reported-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> > Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> > --- > v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt() > Updated the commit-log with the full backtrace. > v1 -> v2 : Rebased against powerpc/next instead of linux/master > > arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++- > arch/powerpc/platforms/pseries/lpar.c | 8 ++++++-- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c > b/arch/powerpc/mm/book3s64/hash_utils.c > index 919a861..d07fcafd 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -38,6 +38,7 @@ > #include <linux/libfdt.h> > #include <linux/pkeys.h> > #include <linux/hugetlb.h> > +#include <linux/cpu.h> > > #include <asm/debugfs.h> > #include <asm/processor.h> > @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val) > > static int hpt_order_set(void *data, u64 val) > { > + int ret; > + > if (!mmu_hash_ops.resize_hpt) > return -ENODEV; > > - return mmu_hash_ops.resize_hpt(val); > + cpus_read_lock(); > + ret = mmu_hash_ops.resize_hpt(val); > + cpus_read_unlock(); > + > + return ret; > } > > DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, > "%llu\n"); > diff --git a/arch/powerpc/platforms/pseries/lpar.c > b/arch/powerpc/platforms/pseries/lpar.c > index 1034ef1..557d592 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data) > return 0; > } > > -/* Must be called in user context */ > +/* > + * Must be called in process context. The caller must hold the > + * cpus_lock. > + */ > static int pseries_lpar_resize_hpt(unsigned long shift) > { > struct hpt_resize_state state = { > @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift) > > t1 = ktime_get(); > > - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL); > + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit, > + &state, NULL); > > t2 = ktime_get(); > > -- > 1.9.4 >