On Tue, Apr 21, 2020 at 06:33:36PM -0500, Nathan Lynch wrote: > Sam Bobroff <sbobr...@linux.ibm.com> writes: > > If a device is hot unplgged during EEH recovery, it's possible for the > > RTAS call to ibm,configure-pe in pseries_eeh_configure() to return > > parameter error (-3), however negative return values are not checked > > for and this leads to an infinite loop. > > > > Fix this by correctly bailing out on negative values. > > > > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com> > > --- > > arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c > > b/arch/powerpc/platforms/pseries/eeh_pseries.c > > index 893ba3f562c4..c4ef03bec0de 100644 > > --- a/arch/powerpc/platforms/pseries/eeh_pseries.c > > +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c > > @@ -605,7 +605,7 @@ static int pseries_eeh_configure_bridge(struct eeh_pe > > *pe) > > config_addr, BUID_HI(pe->phb->buid), > > BUID_LO(pe->phb->buid)); > > > > - if (!ret) > > + if (ret <= 0) > > return ret; > > Note that this returns the firmware error value (e.g. -3 parameter > error) without converting it to a Linux errno. Nothing checks the error > value of this function as best I can tell, but -EINVAL would be better > than an implicit -ESRCH here.
Right, it's never used but I agree. I'll change it for v3. > And while this will behave correctly, the pr_warn() at the end of > pseries_eeh_configure_bridge() hints that someone had the intention > that this code should log a message on such an error: > > static int pseries_eeh_configure_bridge(struct eeh_pe *pe) > { > int config_addr; > int ret; > /* Waiting 0.2s maximum before skipping configuration */ > int max_wait = 200; > > /* Figure out the PE address */ > config_addr = pe->config_addr; > if (pe->addr) > config_addr = pe->addr; > > while (max_wait > 0) { > ret = rtas_call(ibm_configure_pe, 3, 1, NULL, > config_addr, BUID_HI(pe->phb->buid), > BUID_LO(pe->phb->buid)); > > if (!ret) > return ret; > > /* > * If RTAS returns a delay value that's above 100ms, cut it > * down to 100ms in case firmware made a mistake. For more > * on how these delay values work see rtas_busy_delay_time > */ > if (ret > RTAS_EXTENDED_DELAY_MIN+2 && > ret <= RTAS_EXTENDED_DELAY_MAX) > ret = RTAS_EXTENDED_DELAY_MIN+2; > > max_wait -= rtas_busy_delay_time(ret); > > if (max_wait < 0) > break; > > rtas_busy_delay(ret); > } > > pr_warn("%s: Unable to configure bridge PHB#%x-PE#%x (%d)\n", > __func__, pe->phb->global_number, pe->addr, ret); > return ret; > } > > So perhaps the error path should be made to break out of the loop > instead of returning. Or is the parameter error result simply > uninteresting in this scenario? Sounds reasonable to me, and given that the only way I know to trigger the error path (see the commit message) is not going to be common, I think a message is a good idea. (And, as one of the people likely to debug a future issue here, I'll probably appreciate it.) Cheers, Sam.
signature.asc
Description: PGP signature