On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote: > Upon failure to recover from a PCIe error through AER, DPC or EDR, a > uevent is sent to inform user space about disconnection of the bridge > whose subordinate devices failed to recover. > > However the bridge itself is not disconnected. Instead, a uevent should > be sent for each of the subordinate devices. > > Only if the "bridge" happens to be a Root Complex Event Collector or > Integrated Endpoint does it make sense to send a uevent for it (because > there are no subordinate devices). > > Right now if there is a mix of subordinate devices with and without > pci_error_handlers, a BEGIN_RECOVERY event is sent for those with > pci_error_handlers but no FAILED_RECOVERY event is ever sent for them > afterwards. Fix it. > > Fixes: 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume") > Signed-off-by: Lukas Wunner <lu...@wunner.de> > Cc: sta...@vger.kernel.org # v4.16+ > --- --- snip --- > > +static int report_perm_failure_detected(struct pci_dev *dev, void *data) > +{ > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > + return 0; > +} > + > static int report_mmio_enabled(struct pci_dev *dev, void *data) > { > struct pci_driver *pdrv; > @@ -272,7 +278,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > failed: > pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); > > - pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT); > + pci_walk_bridge(bridge, report_perm_failure_detected, NULL); > > pci_info(bridge, "device recovery failed\n"); >
Thanks for catching this during review of my other error recovery uevent fix! Looks good and I like that you kept the report_*_detected() naming which makes the mismatched symmetry of the existing code quite easy to see. Reviewed-by: Niklas Schnelle <schne...@linux.ibm.com>