On Wed, Oct 21, 2020 at 04:39:28PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopher...@intel.com> writes:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e0fea09a6e42..89019e6476b3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -478,18 +478,13 @@ static int kvm_fill_hv_flush_list_func(struct 
> > hv_guest_mapping_flush_list *flush
> >                     range->pages);
> >  }
> >  
> > -static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range 
> > *range)
> > +static inline int hv_remote_flush_pgd(u64 pgd, struct kvm_tlb_range *range)
> >  {
> > -   /*
> > -    * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
> > -    * of the base of EPT PML4 table, strip off EPT configuration
> > -    * information.
> > -    */
> >     if (range)
> > -           return hyperv_flush_guest_mapping_range(eptp & PAGE_MASK,
> > +           return hyperv_flush_guest_mapping_range(pgd,
> >                             kvm_fill_hv_flush_list_func, (void *)range);
> >     else
> > -           return hyperv_flush_guest_mapping(eptp & PAGE_MASK);
> > +           return hyperv_flush_guest_mapping(pgd);
> >  }
> 
> (I'm probably missing something, please bear with me -- this is the last
> patch of the series after all :-) but PGD which comes from
> kvm_mmu_load_pgd() has PCID bits encoded and you're dropping
> '&PAGE_MASK' here ...

...

> > @@ -564,17 +559,17 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu 
> > *vcpu)
> >  
> >  #endif /* IS_ENABLED(CONFIG_HYPERV) */
> >  
> > -static void hv_load_mmu_eptp(struct kvm_vcpu *vcpu, u64 eptp)
> > +static void hv_load_mmu_pgd(struct kvm_vcpu *vcpu, u64 pgd)
> >  {
> >  #if IS_ENABLED(CONFIG_HYPERV)
> >     struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
> >  
> >     if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
> > -           spin_lock(&kvm_vmx->ept_pointer_lock);
> > -           to_vmx(vcpu)->ept_pointer = eptp;
> > -           if (eptp != kvm_vmx->hv_tlb_eptp)
> > -                   kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> > -           spin_unlock(&kvm_vmx->ept_pointer_lock);
> > +           spin_lock(&kvm_vmx->hv_tlb_pgd_lock);
> > +           to_vmx(vcpu)->hv_tlb_pgd = pgd;
> > +           if (pgd != kvm_vmx->hv_tlb_pgd)
> > +                   kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
> > +           spin_unlock(&kvm_vmx->hv_tlb_pgd_lock);
> >     }
> >  #endif
> >  }
> > @@ -3059,7 +3054,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, 
> > unsigned long pgd,
> >             eptp = construct_eptp(vcpu, pgd, pgd_level);
> >             vmcs_write64(EPT_POINTER, eptp);
> >  
> > -           hv_load_mmu_eptp(vcpu, eptp);
> > +           hv_load_mmu_pgd(vcpu, pgd);
> 
> ... and not adding it here. (construct_eptp() seems to drop PCID bits
> but add its own stuff). Is this on purpose?

No, I completely forgot KVM crams the PCID bits into pgd.  I'll think I'll add
a patch to rework .load_mmu_pgd() to move the PCID bits to a separate param,
and change construct_eptp() to do WARN_ON_ONCE(pgd & ~PAGE_MASK).

Actually, I think it makes more sense to have VMX and SVM, grab the PCID via
kvm_get_active_pcid(vcpu) when necessary.  For EPTP, getting the PCID bits may
unnecessarily read CR3 from the VMCS.

Ugh, which brings up another issue.  I'm pretty sure the "vmcs01.GUEST_CR3 is
already up-to-date" is dead code:

                if (!enable_unrestricted_guest && !is_paging(vcpu))
                        guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
                else if (test_bit(VCPU_EXREG_CR3, (ulong 
*)&vcpu->arch.regs_avail))
                        guest_cr3 = vcpu->arch.cr3;
                else /* vmcs01.GUEST_CR3 is already up-to-date. */
                        update_guest_cr3 = false;
                vmx_ept_load_pdptrs(vcpu);

The sole caller of .load_mmu_pgd() always invokes kvm_get_active_pcid(), which
in turn always does kvm_read_cr3(), i.e. CR3 will always be available.

So yeah, I think moving kvm_get_active_pcid() in VMX/SVM is the right approach.
I'll rename "pgd" to "root_hpa" and "pgd_level" to "root_level" so that we
don't end up with inconsistencies, e.g. where pgd may or may not contain PCID
bits.

Nice catch!

Reply via email to