> On 18-Mar-2022, at 9:12 AM, Michael Ellerman <m...@ellerman.id.au> wrote: > To avoid it, we can take a reference to the host_bridge->dev until we're > done using phb. Then when we drop the reference the phb will be freed. > > Fixes: 2dd9c11b9d4d ("powerpc/pseries: use pci_host_bridge.release_fn() to > kfree(phb)") > Reported-by: David Dai <z...@linux.ibm.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > — Verified successfully with 5.17.0-rc8-00061-g34e047aa16c0 + patch.
Tested-by: Sachin Sant <sach...@linux.ibm.com> Without this patch: # drmgr -c phb -s "PHB 18" -r -d 5 [ 178.107171] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1 [ 178.107429] rpaphp: Slot [U78D8.ND0.FGD003M-P0-C2-C0] registered [ 178.107578] rpaphp: Slot [U78D8.ND0.FGD003M-P0-C3-C0] registered [ 178.107721] rpaphp: Slot [U78D8.ND0.FGD003M-P0-C4-C0] registered [ 178.196960] pci_bus 0012:01: busn_res: [bus 01-ff] is released [ 178.197040] BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6ba3 [ 178.197045] Faulting instruction address: 0xc000000000181068 [ 178.197049] Oops: Kernel access of bad area, sig: 11 [#1] [ 178.197051] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 178.197056] Modules linked in: rpadlpar_io rpaphp dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill bonding ip_set tls nf_tables nfnetlink sunrpc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg ibmvscsi scsi_transport_srp ibmveth xts vmx_crypto fuse [ 178.197086] CPU: 15 PID: 10565 Comm: drmgr Not tainted 5.17.0-rc8-00061-g34e047aa16c0 #1 [ 178.197091] NIP: c000000000181068 LR: c000000000181060 CTR: 0000000000000000 [ 178.197094] REGS: c00000002af5b700 TRAP: 0380 Not tainted (5.17.0-rc8-00061-g34e047aa16c0) [ 178.197097] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24088220 XER: 20040003 [ 178.197105] CFAR: c000000000f60564 IRQMASK: 0 [ 178.197105] GPR00: c000000000181060 c00000002af5b9a0 c000000002a78c00 c000000002b79d50 [ 178.197105] GPR04: 0000000000000000 c00c00000015b348 c00c000000030288 0000000000000000 [ 178.197105] GPR08: 00000000000000ff 0000000000000100 6b6b6b6b6b6b6b6b 0000000000008000 [ 178.197105] GPR12: c000000000260940 c000000a5ffedb00 0000000000000000 0000000107a8f0f0 [ 178.197105] GPR16: 0000000000000006 0000000000000003 0000000000000002 0000000000000004 [ 178.197105] GPR20: 0000000000000005 0000000107a8ede8 0000000107a8c5a8 c000000002abae85 [ 178.197105] GPR24: 0000000000000000 0000000000000001 c00000000cabd078 c00000000c50ac00 [ 178.197105] GPR28: 0000000000000000 c000000a851cac00 c00000000c501c00 c00000000c501d38 [ 178.197139] NIP [c000000000181068] release_resource+0x38/0xf0 [ 178.197146] LR [c000000000181060] release_resource+0x30/0xf0 [ 178.197149] Call Trace: [ 178.197151] [c00000002af5b9a0] [c0000000008c4814] pci_remove_bus+0xf4/0x110 (unreliable) [ 178.197156] [c00000002af5b9d0] [c00000000011a018] remove_phb_dynamic+0x178/0x190 [ 178.197161] [c00000002af5ba50] [c0080000086f09e8] dlpar_remove_slot+0x1d0/0x250 [rpadlpar_io] [ 178.197166] [c00000002af5baf0] [c0080000086f0b5c] remove_slot_store+0xa4/0x160 [rpadlpar_io] [ 178.197170] [c00000002af5bb80] [c00000000088a6dc] kobj_attr_store+0x2c/0x50 [ 178.197174] [c00000002af5bba0] [c0000000006b19e4] sysfs_kf_write+0x64/0x80 [ 178.197179] [c00000002af5bbc0] [c0000000006b0e0c] kernfs_fop_write_iter+0x1bc/0x2b0 [ 178.197183] [c00000002af5bc10] [c0000000005816a4] new_sync_write+0x124/0x1c0 [ 178.197187] [c00000002af5bcb0] [c000000000585954] vfs_write+0x2c4/0x390 [ 178.197190] [c00000002af5bd00] [c000000000585d24] ksys_write+0x84/0x140 [ 178.197194] [c00000002af5bd50] [c000000000037024] system_call_exception+0x254/0x550 [ 178.197198] [c00000002af5be10] [c00000000000bfe8] system_call_vectored_common+0xe8/0x278 [ 178.197203] --- interrupt: 3000 at 0x7fffa7118774 [ 178.197206] NIP: 00007fffa7118774 LR: 0000000000000000 CTR: 0000000000000000 [ 178.197208] REGS: c00000002af5be80 TRAP: 3000 Not tainted (5.17.0-rc8-00061-g34e047aa16c0) [ 178.197211] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 44084404 XER: 00000000 [ 178.197220] IRQMASK: 0 [ 178.197220] GPR00: 0000000000000004 00007ffff50a4ae0 00007fffa7227100 0000000000000006 [ 178.197220] GPR04: 0000000140744930 0000000000000006 0000000000000000 0000000000000001 [ 178.197220] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 178.197220] GPR12: 0000000000000000 00007fffa733b100 0000000000000000 0000000107a8f0f0 [ 178.197220] GPR16: 0000000000000006 0000000000000003 0000000000000002 0000000000000004 [ 178.197220] GPR20: 0000000000000005 0000000107a8ede8 0000000107a8c5a8 0000000107a8d3a0 [ 178.197220] GPR24: 0000000000000006 0000000107a8b3c0 0000000140744930 0000000000000006 [ 178.197220] GPR28: 0000000000000006 00000001407202a0 0000000140744930 0000000000000006 [ 178.197251] NIP [00007fffa7118774] 0x7fffa7118774 [ 178.197254] LR [0000000000000000] 0x0 [ 178.197255] --- interrupt: 3000 [ 178.197257] Instruction dump: [ 178.197259] 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78 3c620010 38631150 f8010010 [ 178.197265] f821ffd1 48ddf4dd 60000000 e95f0028 <e92a0038> 2c290000 41820028 7c3f4840 [ 178.197273] ---[ end trace 0000000000000000 ]--- With the patch applied both add and remove operation works correctly. # lsslot -c phb | grep 18 PHB 18 /pci@800000020000012 U78D8.ND0.FGD003M-P0-C2-C0 # drmgr -c phb -s "PHB 18" -r -d 5 ########## Mar 20 02:58:53 2022 ########## drmgr: -c phb -s PHB 18 -r -d 5 Validating PHB DLPAR capability...yes. Getting node types 0x00000010 …….. Retrieving hotplug nodes hp adapter status for U78D8.ND0.FGD003M-P0-C2-C0 is 0 setting hp adapter status to UNCONFIG adapter for U78D8.ND0.FGD003M-P0-C2-C0 hp adapter status for U78D8.ND0.FGD003M-P0-C2-C0 is 0 performing kernel op for PHB 18, file is /sys/bus/pci/slots/control/remove_slot Removing device-tree node /proc/device-tree/pci@800000020000012 Removing device-tree node /proc/device-tree/interrupt-controller@800000025000012 Releasing drc index 0x20000012 get-sensor for 20000012: 0, 1 Setting isolation state to 'isolate' Setting allocation state to 'alloc unusable' get-sensor for 20000012: 0, 2 drc_index 20000012 sensor-state: 2 Resource is not available to the partition. ########## Mar 20 02:58:54 2022 ########## # lsslot -c phb | grep 18 # drmgr -c phb -s "PHB 18" -a -d 5 ########## Mar 20 03:00:16 2022 ########## drmgr: -c phb -s PHB 18 -a -d 5 Validating PHB DLPAR capability...yes. Getting node types 0x00000010 ….. get-sensor for 22010012: 0, 0 performing kernel op for PHB 18, file is /sys/bus/pci/slots/control/add_slot ########## Mar 20 03:00:19 2022 ########## # lsslot -c phb | grep 18 PHB 18 /pci@800000020000012 U78D8.ND0.FGD003M-P0-C2-C0 # - Sachin