On Tue, Apr 21, 2020 at 06:33:36PM -0500, Nathan Lynch wrote:
> Sam Bobroff <sbobr...@linux.ibm.com> writes:
> > If a device is hot unplgged during EEH recovery, it's possible for the
> > RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
> > parameter error (-3), however negative return values are not checked
> > for and this leads to an infinite loop.
> >
> > Fix this by correctly bailing out on negative values.
> >
> > Signed-off-by: Sam Bobroff <sbobr...@linux.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> > b/arch/powerpc/platforms/pseries/eeh_pseries.c
> > index 893ba3f562c4..c4ef03bec0de 100644
> > --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> > +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> > @@ -605,7 +605,7 @@ static int pseries_eeh_configure_bridge(struct eeh_pe 
> > *pe)
> >                             config_addr, BUID_HI(pe->phb->buid),
> >                             BUID_LO(pe->phb->buid));
> >  
> > -           if (!ret)
> > +           if (ret <= 0)
> >                     return ret;
> 
> Note that this returns the firmware error value (e.g. -3 parameter
> error) without converting it to a Linux errno. Nothing checks the error
> value of this function as best I can tell, but -EINVAL would be better
> than an implicit -ESRCH here.

Right, it's never used but I agree. I'll change it for v3.

> And while this will behave correctly, the pr_warn() at the end of
> pseries_eeh_configure_bridge() hints that someone had the intention
> that this code should log a message on such an error:
> 
> static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
> {
>       int config_addr;
>       int ret;
>       /* Waiting 0.2s maximum before skipping configuration */
>       int max_wait = 200;
> 
>       /* Figure out the PE address */
>       config_addr = pe->config_addr;
>       if (pe->addr)
>               config_addr = pe->addr;
> 
>       while (max_wait > 0) {
>               ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
>                               config_addr, BUID_HI(pe->phb->buid),
>                               BUID_LO(pe->phb->buid));
> 
>               if (!ret)
>                       return ret;
> 
>               /*
>                * If RTAS returns a delay value that's above 100ms, cut it
>                * down to 100ms in case firmware made a mistake.  For more
>                * on how these delay values work see rtas_busy_delay_time
>                */
>               if (ret > RTAS_EXTENDED_DELAY_MIN+2 &&
>                   ret <= RTAS_EXTENDED_DELAY_MAX)
>                       ret = RTAS_EXTENDED_DELAY_MIN+2;
> 
>               max_wait -= rtas_busy_delay_time(ret);
> 
>               if (max_wait < 0)
>                       break;
> 
>               rtas_busy_delay(ret);
>       }
> 
>       pr_warn("%s: Unable to configure bridge PHB#%x-PE#%x (%d)\n",
>               __func__, pe->phb->global_number, pe->addr, ret);
>       return ret;
> }
> 
> So perhaps the error path should be made to break out of the loop
> instead of returning. Or is the parameter error result simply
> uninteresting in this scenario?

Sounds reasonable to me, and given that the only way I know to trigger
the error path (see the commit message) is not going to be common, I
think a message is a good idea. (And, as one of the people likely to
debug a future issue here, I'll probably appreciate it.)

Cheers,
Sam.

Attachment: signature.asc
Description: PGP signature

Reply via email to