On Thu, Mar 07, 2013 at 12:25:26PM +0100, Jan Kiszka wrote:
> On 2013-03-07 12:06, Gleb Natapov wrote:
> > On Thu, Mar 07, 2013 at 11:37:43AM +0100, Jan Kiszka wrote:
> >> On 2013-03-07 09:57, Gleb Natapov wrote:
> >>> On Thu, Mar 07, 2013 at 09:53:49AM +0100, Jan Kiszka wrote:
> >>>> On 2013-03-07 09:43, Gleb Natapov wrote:
> >>>>> On Thu, Mar 07, 2013 at 09:12:19AM +0100, Jan Kiszka wrote:
> >>>>>> On 2013-03-07 08:51, Gleb Natapov wrote:
> >>>>>>> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote:
> >>>>>>>> The logic for calculating the value with which we call kvm_set_cr0/4 
> >>>>>>>> was
> >>>>>>>> broken (will definitely be visible with nested unrestricted guest 
> >>>>>>>> mode
> >>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too 
> >>>>>>>> early
> >>>>>>>> when in guest mode.
> >>>>>>>>
> >>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out 
> >>>>>>>> L1-owned
> >>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
> >>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, 
> >>>>>>>> thus,
> >>>>>>>> are not suited as input.
> >>>>>>>>
> >>>>>>>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON 
> >>>>>>>> and
> >>>>>>>> refuse the update if it fails. To be fully consistent, we implement 
> >>>>>>>> this
> >>>>>>>> check now also for CR4.
> >>>>>>>>
> >>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
> >>>>>>>> originally.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
> >>>>>>>> ---
> >>>>>>>>
> >>>>>>>> Changes in v2:
> >>>>>>>>  - keep the non-misleading part of the comment in handle_set_cr0
> >>>>>>>>
> >>>>>>>>  arch/x86/kvm/vmx.c |   46 
> >>>>>>>> +++++++++++++++++++++++++++++++---------------
> >>>>>>>>  1 files changed, 31 insertions(+), 15 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>>>> index 7cc566b..832b7b4 100644
> >>>>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>>>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, 
> >>>>>>>> unsigned char *hypercall)
> >>>>>>>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
> >>>>>>>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
> >>>>>>>>  {
> >>>>>>>> -    if (to_vmx(vcpu)->nested.vmxon &&
> >>>>>>>> -        ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
> >>>>>>>> -            return 1;
> >>>>>>>> -
> >>>>>>>>      if (is_guest_mode(vcpu)) {
> >>>>>>>> +            struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >>>>>>>> +            unsigned long orig_val = val;
> >>>>>>>> +
> >>>>>>>>              /*
> >>>>>>>>               * We get here when L2 changed cr0 in a way that did 
> >>>>>>>> not change
> >>>>>>>>               * any of L1's shadowed bits (see 
> >>>>>>>> nested_vmx_exit_handled_cr),
> >>>>>>>> -             * but did change L0 shadowed bits. This can currently 
> >>>>>>>> happen
> >>>>>>>> -             * with the TS bit: L0 may want to leave TS on (for 
> >>>>>>>> lazy fpu
> >>>>>>>> -             * loading) while pretending to allow the guest to 
> >>>>>>>> change it.
> >>>>>>>> +             * but did change L0 shadowed bits.
> >>>>>>>>               */
> >>>>>>>> -            if (kvm_set_cr0(vcpu, (val & 
> >>>>>>>> vcpu->arch.cr0_guest_owned_bits) |
> >>>>>>>> -                     (vcpu->arch.cr0 & 
> >>>>>>>> ~vcpu->arch.cr0_guest_owned_bits)))
> >>>>>>>> +            val = (val & ~vmcs12->cr0_guest_host_mask) |
> >>>>>>>> +                    (vmcs_read64(GUEST_CR0) & 
> >>>>>>>> vmcs12->cr0_guest_host_mask);
> >>>>>>> I think using GUEST_CR0 here is incorrect. It contains combination of 
> >>>>>>> bits
> >>>>>>> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is 
> >>>>>>> in
> >>>>>>> vcpu->arch.cr0 (almost, but good enough for this case). Why 
> >>>>>>> vcpu->arch.cr0
> >>>>>>> contains right L2/L1 mix?
> >>>>>>
> >>>>>> L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks
> >>>>>> out reserved bits. But you are right, GUEST_CR0 isn't much better. And
> >>>>>> maybe that mangling in kvm_set_cr0 is a corner case we can ignore.
> >>>>>>
> >>>>> I think we can. ET is R/O and wired to 1, so it does not matter what
> >>>>> guest writes there it should be treated as 1. About reserved bits spec
> >>>>> says that software should write what it reads there and does not specify
> >>>>> what happens if software does not follow this.
> >>>>>
> >>>>>>> Because it was set to vmcs12->guest_cr0 during
> >>>>>>> L2 #vmentry. While L2 is running three things may happen to CR0:
> >>>>>>>
> >>>>>>>  1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. 
> >>>>>>> It
> >>>>>>>     will go strait to GUEST_CR0.
> >>>>>>>  2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. 
> >>>>>>> On the
> >>>>>>>     next #vmetry vcpu->arch.cr0 will be set to whatever value L1 
> >>>>>>> calculated.
> >>>>>>>  3. L2 writes to a bit shadowed by L0, but not L1. This is the case we
> >>>>>>>     are handling here. And if we will do it right vcpu->arch.cr0 will 
> >>>>>>> be
> >>>>>>>     up-to-date at the end.
> >>>>>>>
> >>>>>>> The only case when, while this code running, vcpu->arch.cr0 has not
> >>>>>>> up-to-date value is if 1 happened, but since L2 guest overwriting cr0
> >>>>>>> here anyway it does not matter what it previously set in GUEST_CR0. 
> >>>>>>> The
> >>>>>>> correct bits are in a new cr0 value provided by val and accessible by
> >>>>>>> (val & ~vmcs12->cr0_guest_host_mask).
> >>>>>>
> >>>>>> I need to think about it again. Maybe vmcs12->guest_cr0 is best, but
> >>>>>> that's a shot from the hips now.
> >>>>>>
> >>>>> I do not think it is correct because case 3 does not update it. So if 3
> >>>>> happens twice without L1 #vmexit between then vmcs12->guest_cr0 will be
> >>>>> outdated.
> >>>>
> >>>> Again, the only thing that matters here is L1's, not L0's view on the
> >>>> "real" CR0 value. So guest_cr0 is never outdated (/wrt
> >>>> cr0_guest_host_mask) as it will be updated by L1 in step 2. Even if
> >>>> arch.cr0 vs. guest_cr0 makes no difference in practice, the latter is
> >>>> more consistent, so I will go for it unless you can convince me it is 
> >>>> wrong.
> >>>>
> >>> Hmm, yes you are right that wrt cr0_guest_host_mask guest_cr0 should be
> >>> up-to-date. Please write a big comment about it.
> >>
> >> Will do.
> >>
> >>> And what about moving VMXON_CR0_ALWAYSON check into vmx_set_cr0()?
> >>
> >> That doesn't make much sense for CR0 (due to the differences between
> >> vmxon and guest mode - and lacking return code of set_cr4). But I can
> >> consolidate the CR4 checks.
> >>
> > Isn't vmxon check is implicit in a guest mode. i.e if is_guest_mode() is
> > trues then vmxon is on? Return code can be added.
> 
> Ah, sorry, you are not seeing what I'm looking at: The test will change
> for L2 context once unrestricted guest mode is added. At that point, it
> makes more sense to split it into one version that checks against
> VMXON_CR0_ALWAYSON while in vmxon, targeting L1, and another that does
> more complex evaluation for L2, depending on nested_cpu_has2(vmcs12,
> SECONDARY_EXEC_UNRESTRICTED_GUEST).
> 
Ah, OK. Hard to argue that those checks can be consolidated without
seeing them :) So you want to implement unrestricted L1 on restricted L0 and
let L0 emulate real mode of L2 directly?

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to