On 18/11/25 2:26 PM, Mahesh J Salgaonkar wrote:
On 2025-11-05 00:40:52 Wed, Narayana Murty N wrote:The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device hotplug safe") restructured the EEH driver to improve synchronization with the PCI hotplug layer. However, it inadvertently moved pci_lock_rescan_remove() outside its intended scope in eeh_handle_normal_event(), leading to broken PCI error reporting and improper EEH event triggering. Specifically, eeh_handle_normal_event() acquired pci_lock_rescan_remove() before calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to acquire the same lock internally, causing nested locking and disrupting normal EEH event handling paths. This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(), with two public wrappers: eeh_pe_bus_get() with locking enabled. eeh_pe_bus_get_nolock() that skips locking. Callers that already hold pci_lock_rescan_remove() now use eeh_pe_bus_get_nolock() to avoid recursive lock acquisition. Additionally, pci_lock_rescan_remove() calls are restored to the correct position—after eeh_pe_bus_get() and immediately before iterating affected PEs and devices. This ensures EEH-triggered PCI removes occur under proper bus rescan locking without recursive lock contention.[...]diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index ef78ff77cf8f..492fae5e3823 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -812,6 +812,35 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev) ops->set_attention_status(slot->hotplug, 0); }+static const char *eeh_pe_get_loc(struct eeh_pe *pe)+{So it is duplicate of eeh_pe_loc_get() with nolock variant. Instead, can we split original function eeh_pe_loc_get() ? Move the while (bus) loop into another function with name eeh_bus_loc_get(bus) which will be nolock variant and use that here ?
Thanks Mahesh, your suggestion will be taken care in the next version of patch.
https://lore.kernel.org/all/[email protected]/
+ struct pci_bus *bus = eeh_pe_bus_get_nolock(pe); + struct device_node *dn; + const char *location = NULL; + + while (bus) { + dn = pci_bus_to_OF_node(bus); + if (!dn) { + bus = bus->parent; + continue; + } + + if (pci_is_root_bus(bus)) + location = of_get_property(dn, "ibm,io-base-loc-code", + NULL); + else + location = of_get_property(dn, "ibm,slot-location-code", + NULL); + + if (location) + return location; + + bus = bus->parent; + } + + return "N/A"; +} + /** * eeh_handle_normal_event - Handle EEH events on a specific PE * @pe: EEH PE - which should not be used after we return, as it may @@ -846,7 +875,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)pci_lock_rescan_remove(); - bus = eeh_pe_bus_get(pe);+ bus = eeh_pe_bus_get_nolock(pe); if (!bus) { pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n", __func__, pe->phb->global_number, pe->addr); @@ -886,14 +915,14 @@ void eeh_handle_normal_event(struct eeh_pe *pe) /* Log the event */ if (pe->type & EEH_PE_PHB) { pr_err("EEH: Recovering PHB#%x, location: %s\n", - pe->phb->global_number, eeh_pe_loc_get(pe)); + pe->phb->global_number, eeh_pe_get_loc(pe)); } else { struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);pr_err("EEH: Recovering PHB#%x-PE#%x\n",pe->phb->global_number, pe->addr); pr_err("EEH: PE location: %s, PHB location: %s\n", - eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe)); + eeh_pe_get_loc(pe), eeh_pe_get_loc(phb_pe)); }Thanks, -Mahesh.
Regards, -Narayana Murty.
