On Tue, Jun 25, 2019 at 12:59 AM Vaibhav Jain <vaib...@linux.ibm.com> wrote: > > In some cases initial bind of scm memory for an lpar can fail if > previously it wasn't released using a scm-unbind hcall. This situation > can arise due to panic of the previous kernel or forced lpar reset. In > such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
What is a forced lpar reset? fadump? > To mitigate such cases the patch updates drc_pmem_bind() to force a > call to drc_pmem_unbind() in case the initial bind of scm memory fails > with H_OVERLAP error. In case scm-bind operation again fails after the > forced scm-unbind then we follow the existing error path. > > Signed-off-by: Vaibhav Jain <vaib...@linux.ibm.com> > --- > arch/powerpc/platforms/pseries/papr_scm.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index d790e4e4ffb3..049d7927c0a4 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -44,19 +44,26 @@ struct papr_scm_priv { > struct nd_interleave_set nd_set; > }; > +/* Forward declaration */ pointless comment. > +static int drc_pmem_unbind(struct papr_scm_priv *); > > static int drc_pmem_bind(struct papr_scm_priv *p) > { > unsigned long ret[PLPAR_HCALL_BUFSIZE]; > uint64_t rc, token; > - uint64_t saved = 0; > + uint64_t saved; > + bool tried_unbind = false; nit: kernel style uses reverse christmas tree declarations, so this should be: unsigned long ret[PLPAR_HCALL_BUFSIZE]; bool tried_unbind = false; uint64_t rc, token; uint64_t saved; Come to think of it rc should probably be signed since the hcall return codes are negative. I'm surprised that's not causing a warning since we use %lld to print rc rather than %llu. > + dev_dbg(&p->pdev->dev, "bind drc %x\n", p->drc_index); > /* > * When the hypervisor cannot map all the requested memory in a single > * hcall it returns H_BUSY and we call again with the token until > * we get H_SUCCESS. Aborting the retry loop before getting H_SUCCESS > * leave the system in an undefined state, so we wait. > */ > +retry: > token = 0; > + saved = 0; > > do { > rc = plpar_hcall(H_SCM_BIND_MEM, ret, p->drc_index, 0, > @@ -68,8 +75,18 @@ static int drc_pmem_bind(struct papr_scm_priv *p) > } while (rc == H_BUSY); > > if (rc) { > - dev_err(&p->pdev->dev, "bind err: %lld\n", rc); > - return -ENXIO; > + /* retry after unbinding */ > + if (rc == H_OVERLAP && !tried_unbind) { > + dev_warn(&p->pdev->dev, "Un-binding and retrying\n"); > + /* Try unbind and ignore any errors */ > + tried_unbind = true; > + drc_pmem_unbind(p); > + goto retry; I think It'd be cleaner if we put the unbind-and-retry logic into the probe function, e.g: diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 96c53b23e58f..d113779fc27c 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -316,6 +316,14 @@ static int papr_scm_probe(struct platform_device *pdev) /* request the hypervisor to bind this region to somewhere in memory */ rc = drc_pmem_bind(p); + if (rc == -EBUSY) { + /* + * If we kexec()ed the previous kernel might have left the DRC + * bound in memory. Unbind it and try again. + */ + drc_pmem_unbind(p); + rc = drc_pmem_bind(p); + } if (rc) goto err; Up to you though.