On Tue, Oct 13, 2015 at 09:55:53AM +1100, Daniel Axtens wrote: >> Currently, we rely on the existence of struct pci_driver::err_handler >> to judge if the corresponding PCI device should be unplugged during >> EEH recovery (partially hotplug case). However, it's not elaborate. >> some device drivers are implementing part of the EEH error handlers >> to collect diag-data. That means the driver still expects a hotplug >> to recover from the EEH error. > > >> This makes the hotplug criterion more relaxed: if the device driver >> doesn't provide all necessary EEH error handlers, it will experience >> hotplug during EEH recovery. > >Interesting. > >My understanding of Documentation/PCI/pci-error-recovery.txt is that a >driver should be able to just supply an error_detected() callback. If >the driver just wants to collect diag-data and wants to be hotplugged, >it should return PCI_ERS_RESULT_NONE. > >What drivers did you have in mind? >
Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida private GPU drivers. I tried to find the source code from upstream kernel, but failed. Taking an example, one PE has two different devices A and B. A's driver privides error_detected()/slot_reset()/resume() and it's returning NEED_RESET. B's driver just provides error_detected() that returns NONE as you said. EEH core receives NEED_RESET and B won't be having hotplug during recovery. The error won't be recovered on B. Thanks, Gavin >> >> Signed-off-by: Gavin Shan <gws...@linux.vnet.ibm.com> >> --- >> arch/powerpc/kernel/eeh_driver.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/eeh_driver.c >> b/arch/powerpc/kernel/eeh_driver.c >> index 3a626ed..32178a4 100644 >> --- a/arch/powerpc/kernel/eeh_driver.c >> +++ b/arch/powerpc/kernel/eeh_driver.c >> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata) >> driver = eeh_pcid_get(dev); >> if (driver) { >> eeh_pcid_put(dev); >> - if (driver->err_handler) >> + if (driver->err_handler && >> + driver->err_handler->error_detected && >> + driver->err_handler->slot_reset && >> + driver->err_handler->resume) >> return NULL; >> } >> >> -- >> 2.1.0 >> >> _______________________________________________ >> Linuxppc-dev mailing list >> Linuxppc-dev@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/linuxppc-dev _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev