On 6/11/24 8:18 AM, Michael Ellerman wrote:
Hi Ganesh,
Ganesh Goudar <ganes...@linux.ibm.com> writes:
If a PCI device is removed during eeh_pe_report_edev(), edev->pdev
will change and can cause a crash, hold the PCI rescan/remove lock
while taking a copy of edev->pdev.
Signed-off-by: Ganesh Goudar <ganes...@linux.ibm.com>
---
arch/powerpc/kernel/eeh_pe.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index d1030bc52564..49f968733912 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -859,7 +859,9 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
/* Retrieve the parent PCI bus of first (top) PCI device */
edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, entry);
+ pci_lock_rescan_remove();
pdev = eeh_dev_to_pci_dev(edev);
+ pci_unlock_rescan_remove();
if (pdev)
return pdev->bus;
What prevents pdev being freed/reused immediately after you drop the
rescan/remove lock?
Yeah, I should have released the lock after getting bus address, I will send v2.
AFAICS eeh_dev_to_pci_dev() doesn't take an additional reference to the
pdev or anything.
Yes, I think we have to evaluate the possible eventualities of not taking the
reference
in all the cases.
But we need this lock here because, if the PCI error is encountered in the
hotplug remove
path, we need the pci rescan lock to avoid race between hotplug remove path and
the bottom
half of EEH recovery, this lets the hotplug remove to complete since it is
already holding
the lock and drop the recovery process as the device is no longer present.