On 21/12/2021 11:56 am, Jan Beulich wrote:
> On 16.12.2021 10:54, Andrew Cooper wrote:
>> @@ -42,4 +45,49 @@ static inline void wrpkru(uint32_t pkru)
>>                     :: "a" (pkru), "d" (0), "c" (0) );
>>  }
>>  
>> +/*
>> + * Xen does not use PKS.
>> + *
>> + * Guest kernel use is expected to be one default key, except for tiny 
>> windows
>> + * with a double write to switch to a non-default key in a permitted 
>> critical
>> + * section.
>> + *
>> + * As such, we want MSR_PKRS un-intercepted.  Furthermore, as we only need 
>> it
>> + * in Xen for emulation or migration purposes (i.e. possibly never in a
>> + * domain's lifetime), we don't want to re-sync the hardware value on every
>> + * vmexit.
>> + *
>> + * Therefore, we read and cache the guest value in ctxt_switch_from(), in 
>> the
>> + * expectation that we can short-circuit the write in ctxt_switch_to().
>> + * During regular operations in current context, the guest value is in
>> + * hardware and the per-cpu cache is stale.
>> + */
>> +DECLARE_PER_CPU(uint32_t, pkrs);
>> +
>> +static inline uint32_t rdpkrs(void)
>> +{
>> +    uint32_t pkrs, tmp;
>> +
>> +    rdmsr(MSR_PKRS, pkrs, tmp);
>> +
>> +    return pkrs;
>> +}
>> +
>> +static inline uint32_t rdpkrs_and_cache(void)
>> +{
>> +    return this_cpu(pkrs) = rdpkrs();
>> +}
>> +
>> +static inline void wrpkrs(uint32_t pkrs)
>> +{
>> +    uint32_t *this_pkrs = &this_cpu(pkrs);
>> +
>> +    if ( *this_pkrs != pkrs )
> For this to work, I think we need to clear PKRS during CPU init; I
> admit I didn't peek ahead in the series to check whether you do so
> later on in the series. At least the version of the SDM I'm looking
> at doesn't even specify reset state of 0 for this MSR. But even if
> it did, it would likely be as for PKRU - unchanged after INIT. Yet
> INIT is all that CPUs go through when e.g. parking / unparking them.

While trying to address this, I've noticed that we don't sanitise PKRU
during CPU init either.

This will explode in a fun way if e.g. we kexec into a new Xen, but
leave PKEY 0 with AD/WD, and try building a PV dom0.

As soon as we've fully context switched into a vCPU context, we'll pick
up the 0 from XSTATE and do the right thing by default.

~Andrew

Reply via email to