When a device is hot removed on powernv, the hotplug driver clears the device's state. However, on pseries, if a device is removed by phyp after reaching the error threshold, the kernel remains unaware, leading to the device not being torn down. This prevents necessary remediation actions like failover.
Permanently disable the device if the presence check fails. Signed-off-by: Ganesh Goudar <ganes...@linux.ibm.com> --- arch/powerpc/kernel/eeh.c | 4 +++- arch/powerpc/kernel/eeh_driver.c | 8 +++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index ab316e155ea9..8d1606406d3f 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -508,7 +508,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev) * state, PE is in good state. */ if ((ret < 0) || - (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) { + (ret == EEH_STATE_NOT_SUPPORT && + dev->error_state == pci_channel_io_perm_failure) || + eeh_state_active(ret)) { eeh_stats.false_positives++; pe->false_positives++; rc = 0; diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 48773d2d9be3..10317badf471 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -867,7 +867,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe) if (!devices) { pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n", pe->phb->global_number, pe->addr); - goto out; /* nothing to recover */ + /* + * The device is removed, Tear down its state, + * On powernv hotplug driver would take care of + * it but not on pseries, Permanently disable the + * card as it is hot removed. + */ + goto recover_failed; } /* Log the event */ -- 2.44.0