On Thu, Aug 29, 2019 at 6:21 PM Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> wrote: > > On 8/29/19 1:29 PM, Oliver O'Halloran wrote: > > On Thu, Aug 29, 2019 at 4:34 PM Aneesh Kumar K.V > > <aneesh.ku...@linux.ibm.com> wrote: > >> > >> Right now we force an unbind of SCM memory at drcindex on H_OVERLAP error. > >> This really slows down operations like kexec where we get the H_OVERLAP > >> error because we don't go through a full hypervisor re init. > > > > Maybe we should be unbinding it on a kexec(). > > > > shouldn't ?
I mean in the previous kernel. We don't have a shutdown() method defined for the driver so I figured it was leaving the bind intact across kexec, hence the patch. I was thinking that we should have the previous kernel unbind it rather than letting the kernel deal with the H_OVERLAP error. I suppose we'll have to do that in kdump case anyway. > >> H_OVERLAP error for a H_SCM_BIND_MEM hcall indicates that SCM memory at > >> drc index is already bound. Since we don't specify a logical memory > >> address for bind hcall, we can use the H_SCM_QUERY hcall to query > >> the already bound logical address. > > > > This is a little sketchy since we might have crashed during the > > initial bind. Checking if the last block is bound to where we expect > > it to be might be a good idea. If it's not where we expect it to be, > > then an unbind->bind cycle is the only sane thing to do. > > > > I would not have expected hypervisor to not mark the drc index bound if > we failed the previous BIND request. I was thinking of the partial-bind case where the driver started binding, but never exits the loop in drc_pmem_bind() due to a kernel panic or whatever. > I can query start block and last block logical address and check whether > the full blocks is indeed mapped. > > > >> Boot time difference with and without patch is: > >> > >> [ 5.583617] IOMMU table initialized, virtual merging enabled > >> [ 5.603041] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: > >> Retrying bind after unbinding > >> [ 301.514221] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: > >> Retrying bind after unbinding > >> [ 340.057238] hv-24x7: read 1530 catalog entries, created 537 event attrs > >> (0 failures), 275 descs > > > > Is the unbind significantly slower than a bind? Or is the region here > > just massive? > > > > on unbind. We go two regions one of 60G and other of 10G Assuming you mean "it's slow on unbind" then that sounds like a hypervisor bug tbh. Oliver