Benjamin Herrenschmidt <b...@kernel.crashing.org> writes: > There's a somewhat architectural issue with Radix MMU and KVM. > > When coming out of a guest with AIL (ie, MMU enabled), we start > executing hypervisor code with the PID register still containing > whatever the guest has been using. > > The problem is that the CPU can (and will) then start prefetching > or speculatively load from whatever host context has that same > PID (if any), thus bringing translations for that context into > the TLB, which Linux doesn't know about. > > This can cause stale translations and subsequent crashes. > > Fixing this in a way that is neither racy nor a huge performance > impact is difficult. We could just make the host invalidations > always use broadcast forms but that would hurt single threaded > programs for example. > > We chose to fix it instead by partitioning the PID space between > guest and host. This is possible because today Linux only use 19 > out of the 20 bits of PID space, so existing guests will work > if we make the host use the top half of the 20 bits space. > > We additionally add a property to indicate to Linux the size of > the PID register which will be useful if we eventually have > processors with a larger PID space available. > > There is still an issue with malicious guests purposefully setting > the PID register to a value in the host range. Hopefully future HW > can prevent that, but in the meantime, we handle it with a pair of > kludges: > > - On the way out of a guest, before we clear the current VCPU > in the PACA, we check the PID and if it's outside of the permitted > range we flush the TLB for that PID. > > - When context switching, if the mm is "new" on that CPU (the > corresponding bit was set for the first time in the mm cpumask), we > check if any sibling thread is in KVM (has a non-NULL VCPU pointer > in the PACA). If that is the case, we also flush the PID for that > CPU (core). > > This second part is needed to handle the case where a process is > migrated (or starts a new pthread) on a sibling thread of the CPU > coming out of KVM, as there's a window where stale translations > can exist before we detect it and flush them out. > > A future optimization could be added by keeping track of whether > the PID has ever been used and avoid doing that for completely > fresh PIDs. We could similarily mark PIDs that have been the subject of > a global invalidation as "fresh". But for now this will do. > > Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> > --- > > v2. Do the check on KVM exit *after* we've restored the host PID > ....
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index 6ea4b53..e744d11 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -1522,6 +1522,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > std r6, VCPU_BESCR(r9) > stw r7, VCPU_GUEST_PID(r9) > std r8, VCPU_WORT(r9) > + > BEGIN_FTR_SECTION > mfspr r5, SPRN_TCSCR > mfspr r6, SPRN_ACOP > @@ -1728,6 +1729,19 @@ BEGIN_FTR_SECTION > mtspr SPRN_PSSCR, r6 > mtspr SPRN_PID, r7 > mtspr SPRN_IAMR, r8 > + > + /* Handle the case where the guest used an illegal PID */ > + LOAD_REG_ADDR(r4, mmu_base_pid) > + lwz r3, VCPU_GUEST_PID(r9) > + lwz r5, 0(r4) > + cmpw cr0,r3,r5 > + blt 1f > + > + /* Illegal PID, flush the TLB */ > + isync > + bl radix_flush_pid > +1: this need to be done only for radix right ? Do we need radix feature check here ? > + > END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > BEGIN_FTR_SECTION > PPC_INVALIDATE_ERAT -aneesh