On 2021-11-24 22:57:13 Wed, Oliver O'Halloran wrote: > On Wed, Nov 24, 2021 at 7:45 PM Mahesh J Salgaonkar > <mah...@linux.ibm.com> wrote: > > > > No it doesn't. We will still do a presence check before the recovery > > process starts. This patch moves the check after notifying the driver to > > stop active I/O operations. If a presence check finds the device isn't > > present, we will skip the EEH recovery. However, on a surprise hotplug, > > the user will see the EEH messages on the console before it finds there > > is nothing to recover. > > Suppressing the spurious EEH messages was part of why I added that > check in the first place. If you want to defer the presence check > until later you should move the stack trace printing, etc to after > we've confirmed there are still devices present. Considering the
That will help suppressing the spurious EEH messages. > motivation for this patch is to avoid spurious warnings from the > driver I don't think printing spurious EEH messages is much of an > improvement. Agree. > > The other option would be returning an error from the pseries hotplug > driver. IIRC that's what pnv_php / OPAL does if the PHB is fenced and > we can't check the slot presence state. Yeah. I can change rpaphp_get_sensor_state() to use rtas_get_sensor_fast() variant which will return immediately with an error on extended busy error. That way we don't need to move the slot presence check at all. I did test that and it does fix the problem. But I wasn't sure if that would have any implications on hotplug driver behaviour. If pnv_php / OPAL does the same thing then this would be a cleaner approach to fix this issue. Let me send out the patch with this other option to fix the pseries hotplug driver instead. Thanks, -Mahesh. -- Mahesh J Salgaonkar