On 2021-12-09 09:02:51 Thu, Nathan Lynch wrote: > Mahesh Salgaonkar <mah...@linux.ibm.com> writes: > > To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error > > if the slot presence state can not be detected immediately. Current > > implementation uses rtas_get_sensor() API which blocks the slot check state > > until rtas call returns success. Change rpaphp_get_sensor_state() to invoke > > rtas_call(get-sensor-state) directly and take actions based on rtas return > > status. This patch now errors out immediately on busy return status from > > rtas_call. > > > > Please note that, only on certain PHB failures, the slot presence check > > returns BUSY condition. In normal cases it returns immediately with a > > correct presence state value. Hence this change has no impact on normal pci > > dlpar operations. > > I was wondering about this. This seems to be saying -2/990x cannot > happen in other cases. I couldn't find this specified in the > architecture. It seems a bit risky to me to *always* error out on > -2/990x - won't we have intermittent slot enable failures?
Sorry for the late response. So instead of always returning error out how about we error out only if pe is going through EEH recovery ? During get_adapter_status I can check if pe->state is set to EEH_PE_RECOVERING and only then return error on busy else fallback to existing method of rtas_get_sensor. Let me send out another version with this approach. Thanks, -Mahesh. > > > +/* > > + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR: > > + * -1: Hardware Error > > + * -2: RTAS_BUSY > > + * -3: Invalid sensor. RTAS Parameter Error. > > + * -9000: Need DR entity to be powered up and unisolated before RTAS call > > + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS > > call > > + * -9002: DR entity unusable > > + * 990x: Extended delay - where x is a number in the range of 0-5 > > + */ > > +#define RTAS_HARDWARE_ERROR -1 > > +#define RTAS_INVALID_SENSOR -3 > > +#define SLOT_UNISOLATED -9000 > > +#define SLOT_NOT_UNISOLATED -9001 > > +#define SLOT_NOT_USABLE -9002 > > + > > +static int rtas_to_errno(int rtas_rc) > > +{ > > + int rc; > > + > > + switch (rtas_rc) { > > + case RTAS_HARDWARE_ERROR: > > + rc = -EIO; > > + break; > > + case RTAS_INVALID_SENSOR: > > + rc = -EINVAL; > > + break; > > + case SLOT_UNISOLATED: > > + case SLOT_NOT_UNISOLATED: > > + rc = -EFAULT; > > + break; > > + case SLOT_NOT_USABLE: > > + rc = -ENODEV; > > + break; > > + case RTAS_BUSY: > > + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: > > + rc = -EBUSY; > > + break; > > + default: > > + err("%s: unexpected RTAS error %d\n", __func__, rtas_rc); > > + rc = -ERANGE; > > + break; > > + } > > + return rc; > > +} > > These conversions look OK to me. -- Mahesh J Salgaonkar