On 21/12/2021 11:56 am, Jan Beulich wrote: > On 16.12.2021 10:54, Andrew Cooper wrote: >> @@ -42,4 +45,49 @@ static inline void wrpkru(uint32_t pkru) >> :: "a" (pkru), "d" (0), "c" (0) ); >> } >> >> +/* >> + * Xen does not use PKS. >> + * >> + * Guest kernel use is expected to be one default key, except for tiny >> windows >> + * with a double write to switch to a non-default key in a permitted >> critical >> + * section. >> + * >> + * As such, we want MSR_PKRS un-intercepted. Furthermore, as we only need >> it >> + * in Xen for emulation or migration purposes (i.e. possibly never in a >> + * domain's lifetime), we don't want to re-sync the hardware value on every >> + * vmexit. >> + * >> + * Therefore, we read and cache the guest value in ctxt_switch_from(), in >> the >> + * expectation that we can short-circuit the write in ctxt_switch_to(). >> + * During regular operations in current context, the guest value is in >> + * hardware and the per-cpu cache is stale. >> + */ >> +DECLARE_PER_CPU(uint32_t, pkrs); >> + >> +static inline uint32_t rdpkrs(void) >> +{ >> + uint32_t pkrs, tmp; >> + >> + rdmsr(MSR_PKRS, pkrs, tmp); >> + >> + return pkrs; >> +} >> + >> +static inline uint32_t rdpkrs_and_cache(void) >> +{ >> + return this_cpu(pkrs) = rdpkrs(); >> +} >> + >> +static inline void wrpkrs(uint32_t pkrs) >> +{ >> + uint32_t *this_pkrs = &this_cpu(pkrs); >> + >> + if ( *this_pkrs != pkrs ) > For this to work, I think we need to clear PKRS during CPU init; I > admit I didn't peek ahead in the series to check whether you do so > later on in the series. At least the version of the SDM I'm looking > at doesn't even specify reset state of 0 for this MSR. But even if > it did, it would likely be as for PKRU - unchanged after INIT. Yet > INIT is all that CPUs go through when e.g. parking / unparking them.
While trying to address this, I've noticed that we don't sanitise PKRU during CPU init either. This will explode in a fun way if e.g. we kexec into a new Xen, but leave PKEY 0 with AD/WD, and try building a PV dom0. As soon as we've fully context switched into a vCPU context, we'll pick up the 0 from XSTATE and do the right thing by default. ~Andrew