On Mon, Aug 15, 2022 at 04:55:50PM +1000, Michael Ellerman wrote: > The recent change to get_phb_number() causes a DEBUG_ATOMIC_SLEEP > warning on some systems: > > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:580 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper > preempt_count: 1, expected: 0 > RCU nest depth: 0, expected: 0 > 1 lock held by swapper/1: > #0: c157efb0 (hose_spinlock){+.+.}-{2:2}, at: > pcibios_alloc_controller+0x64/0x220 > Preemption disabled at: > [<00000000>] 0x0 > CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0-yocto-standard+ #1 > Call Trace: > [d101dc90] [c073b264] dump_stack_lvl+0x50/0x8c (unreliable) > [d101dcb0] [c0093b70] __might_resched+0x258/0x2a8 > [d101dcd0] [c0d3e634] __mutex_lock+0x6c/0x6ec > [d101dd50] [c0a84174] of_alias_get_id+0x50/0xf4 > [d101dd80] [c002ec78] pcibios_alloc_controller+0x1b8/0x220 > [d101ddd0] [c140c9dc] pmac_pci_init+0x198/0x784 > [d101de50] [c140852c] discover_phbs+0x30/0x4c > [d101de60] [c0007fd4] do_one_initcall+0x94/0x344 > [d101ded0] [c1403b40] kernel_init_freeable+0x1a8/0x22c > [d101df10] [c00086e0] kernel_init+0x34/0x160 > [d101df30] [c001b334] ret_from_kernel_thread+0x5c/0x64 > > This is because pcibios_alloc_controller() holds hose_spinlock but > of_alias_get_id() takes of_mutex which can sleep. > > The hose_spinlock protects the phb_bitmap, and also the hose_list, but > it doesn't need to be held while get_phb_number() calls the OF routines, > because those are only looking up information in the device tree. > > So fix it by having get_phb_number() take the hose_spinlock itself, only > where required, and then dropping the lock before returning. > pcibios_alloc_controller() then needs to take the lock again before the > list_add() but that's safe, the order of the list is not important. > > Fixes: 0fe1e96fef0a ("powerpc/pci: Prefer PCI domain assignment via DT > 'linux,pci-domain' and alias") > Reported-by: Guenter Roeck <li...@roeck-us.net> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
The problem is no longer seen with this patch applied. Tested-by: Guenter Roeck <li...@roeck-us.net> Guenter