On Thu, Jun 27, 2019 at 2:58 AM Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> wrote: > > Vaibhav Jain <vaib...@linux.ibm.com> writes: > > *snip* > > + /* If phyp says drc memory still bound then force unbound and retry */ > > + if (rc == -EBUSY) { > > + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); > > + drc_pmem_unbind(p); > This should only be caused by kexec right?
We should only ever hit this path if there's an unclean shutdown, so kdump or fadump. For a normal kexec the previous kernel should have torn down the binding for us. > And considering kernel nor > hypervisor won't change device binding details, can you check switching > this to H_SCM_QUERY_BLOCK_MEM_BINDING? I thought about using the QUERY_BLOCK_MEM_BINDING call, but I'm not sure it's a good idea. It bakes in assumptions about what the *previous* kernel did with the SCM volume that might not be valid. A panic while unbinding a region would result in a partially-bound region which might break the query call. Also, it's possible that we might have SCM drivers in the future that do something other than just binding the volume in one contiguous chunk. UNBIND_ALL is robust against all of these and robustness is what you want out of an error handling mechanism. > Will that result in faster boot? As I said in the comments on v1, do we have any actual numbers on how long the bind step takes? From memory, you could bind ~32GB in a single bind h-call before phyp would hit it's time limit of 250us and return a continue token. Assuming that holds we'll be saving a few dozen milliseconds at best. > > + rc = drc_pmem_bind(p); > > + } > > + > > if (rc) > > goto err; > > > > I am also not sure about the module reference count here. Should we > increment the module reference count after a bind so that we can track > failures in ubind and fail the module unload? I don't really get what you're concerned about here. The error handling path calls drc_pmem_unbind() so if there's a bind error we should never leave probe with memory still bound. > -aneesh >