On Thursday 03 September 2015 11:52 AM, Sam Bobroff wrote: > On Thu, Sep 03, 2015 at 03:05:21PM +1000, David Gibson wrote: > > [snip] > >> Hm.. so why can't the hypervisor code do the retrying? > > Aravinda replied to this earlier in the thread: > > "Retrying cannot be done internally in h_report_mc_err hcall: only one > thread can succeed entering qemu upon parallel hcall and hence retrying > inside the hcall will not allow the ibm,nmi-interlock from first CPU to > succeed." > > I assume that this means that the big QEMU lock is held while an hcall is > processed by QEMU, but I haven't checked the code myself. Actually, even if > the > lock is normally held, I don't see why these particular hcalls couldn't > release > the lock. I'll look into this.
I am not sure whether we can release this lock inside an hcall. I need to check. > >>>> Also, it looks like the vector will need at least one scratch register >>>> (for the hcall number, if nothing else). Does PAPR specify what SPRGs >>>> the vector can clobber? Obviously it can't be anything the guest >>>> kernel uses. >>> >>> PAPR only says SPRGs 0 to 3 are for software use, but the kernel (see >>> arch/powerpc/include/asm/reg.h) defines SPRG2 as an exception scratch >>> register >>> so it should be the right one to use here. >> >> Uh.. no. If 0..3 are for software (i.e. OS) use, then this needs to >> use a different one, since it's being used as a firmware resource >> here. Linux might treat SPRG2 as scratch, but another OS would be >> within its rights to use it for something persistent. >> >> Although, as paulus points out, sc 1 will clobber SRR0/1 anyway, and >> if we use a special illegal instruction, then you no longer need a >> scratch register. >> >>>> Btw, does anyone know what happens with the VPA (and dispatch trace >>>> log and so forth) on kexec() - it could be subject to the same stale >>>> address problem, and rewriting vectors won't save us there. >>> >>> I asked Michael Ellerman this one and he thinks kexec probably frees and >>> re-allocates the VPA. >> >> Ok. So the question is: if an explicit deregister is good enough for >> the VPA, is it also good enough for the FWNMI vector, in which case >> doing it with just a qemu exit and not bouncing through the guest space >> is back on the table. >> >> I guess that's still problematic because there are existing guests >> that assume a kexec() will magically wipe the fwnmi vectors away. > > Yes, but I think we could handle this separately if necessary: even if we > don't > need to write anything to the vector, we could still insert a magic value and > check for it later. If it's been clobbered by a kexec, go back to the old > method. "> check for it later" - But does QEMU is informed or get to know when kexec() is issued? Regards, Aravinda > > Sam. > -- Regards, Aravinda