PV: conditionally avoid raising #GP for early guest MSR reads

Jan Beulich Mon, 08 Mar 2021 01:33:25 -0800

On 08.03.2021 09:56, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>>     messages (in debug builds). Don't alter WRMSR behavior.
>> ---
>> I'm not convinced we can get away without also making the WRMSR path
>> somewhat more permissive again, e.g. tolerating attempts to set bits
>> which are already set. But of course this would require keeping in sync
>> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
>> may also appear as an attempt to clear others (because of the zero value
>> that we gave it for the read).
> 
> The above approach seems dangerous, as it could allow a guest to
> figure out the value of the underlying MSR by probing whether values
> trigger a #GP?


Perhaps, yes. But what do you do? There's potentially a huge value
range to probe ...

> I think we want to do something similar to what we do on HVM in 4.14
> and previous versions: ignore writes as long as the rdmsr to the
> target MSR succeeds, regardless of the value.

Which, as said elsewhere, has its own downsides - writable MSRs don't
need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
MSRs.

>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>      struct vcpu *curr = current;
>>      const struct domain *currd = curr->domain;
>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>> -    bool vpmu_msr = false;
>> +    bool vpmu_msr = false, warn = false;
>>      int ret;
>>  
>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>          if ( ret == X86EMUL_EXCEPTION )
>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>  
>> -        return ret;
>> +        goto done;
>>      }
>>  
>>      switch ( reg )
>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>          }
>>          /* fall through */
>>      default:
>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>> +        warn = true;
>>          break;
>>  
>>      normal:
>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>          return X86EMUL_OKAY;
>>      }
>>  
>> -    return X86EMUL_UNHANDLEABLE;
>> + done:
> 
> Won't this handling be better placed in the 'default' switch case
> above?

No - see the "goto done" added near the top of the function.

>> +    if ( ret != X86EMUL_OKAY && 
>> !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
>> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> 
> We didn't used to care about explicitly blocking the reserved MSR
> range, do we really wan to do it now?
> 
> I'm not sure I see an issue with that, but given that we are trying to
> bring back something similar to the previous behavior, I would avoid
> adding new conditions.

What I'm particularly trying to avoid here is to allow
information from an underlying hypervisor to "shine through",
even if it's only information as to whether a certain MSR is
readable. It should be solely our own selection which MSRs in
this range a guest is able to (appear to) read.

Plus of course by avoiding the rdmsr_safe() in this case we
also avoid the unnecessary (debug only) log message associated
with such attempted reads.

Jan

Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads

Reply via email to