On Thu, Sep 03, 2015 at 04:22:22PM +1000, Sam Bobroff wrote: > On Thu, Sep 03, 2015 at 03:05:21PM +1000, David Gibson wrote: > > [snip] > > > Hm.. so why can't the hypervisor code do the retrying? > > Aravinda replied to this earlier in the thread: > > "Retrying cannot be done internally in h_report_mc_err hcall: only one > thread can succeed entering qemu upon parallel hcall and hence retrying > inside the hcall will not allow the ibm,nmi-interlock from first CPU to > succeed." > > I assume that this means that the big QEMU lock is held while an hcall is > processed by QEMU, but I haven't checked the code myself. Actually, even if > the > lock is normally held, I don't see why these particular hcalls couldn't > release > the lock. I'll look into this.
Yes, you should be able to release the BQL in the hcall in order to do retries internally. Thomas Huth's draft H_RANDOM implementation does something similar, since it can block > > > > Also, it looks like the vector will need at least one scratch register > > > > (for the hcall number, if nothing else). Does PAPR specify what SPRGs > > > > the vector can clobber? Obviously it can't be anything the guest > > > > kernel uses. > > > > > > PAPR only says SPRGs 0 to 3 are for software use, but the kernel (see > > > arch/powerpc/include/asm/reg.h) defines SPRG2 as an exception scratch > > > register > > > so it should be the right one to use here. > > > > Uh.. no. If 0..3 are for software (i.e. OS) use, then this needs to > > use a different one, since it's being used as a firmware resource > > here. Linux might treat SPRG2 as scratch, but another OS would be > > within its rights to use it for something persistent. > > > > Although, as paulus points out, sc 1 will clobber SRR0/1 anyway, and > > if we use a special illegal instruction, then you no longer need a > > scratch register. > > > > > > Btw, does anyone know what happens with the VPA (and dispatch trace > > > > log and so forth) on kexec() - it could be subject to the same stale > > > > address problem, and rewriting vectors won't save us there. > > > > > > I asked Michael Ellerman this one and he thinks kexec probably frees and > > > re-allocates the VPA. > > > > Ok. So the question is: if an explicit deregister is good enough for > > the VPA, is it also good enough for the FWNMI vector, in which case > > doing it with just a qemu exit and not bouncing through the guest space > > is back on the table. > > > > I guess that's still problematic because there are existing guests > > that assume a kexec() will magically wipe the fwnmi vectors away. > > Yes, but I think we could handle this separately if necessary: even if we > don't > need to write anything to the vector, we could still insert a magic value and > check for it later. If it's been clobbered by a kexec, go back to the old > method. True. Of course if you're going to do that, it makes sense to make the value a a distinguishable illegal instrucion anyway. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpxZbPx4TZ8t.pgp
Description: PGP signature