Re: [PATCH] i386/kvm: Prefault memory on page state change

Xiaoyao Li Tue, 03 Jun 2025 08:33:07 -0700

On 6/3/2025 11:00 PM, Paolo Bonzini wrote:

On 6/3/25 13:47, Xiaoyao Li wrote:
On 6/3/2025 3:41 PM, Xiaoyao Li wrote:
On 3/29/2025 4:30 AM, Tom Lendacky wrote:
A page state change is typically followed by an access of thepage(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typicallyaccepted in4M chunks. A page state change request is submitted to mark thepages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.
When performing a page state change, invoke KVM_PRE_FAULT_MEMORY forthesize of the page state change in order to pre-map the pages andavoid the
additional VMEXITs. This helps speed up boot times.
Unfortunately, it breaks TDX guest.
kvm_hc_map_gpa_range gpa 0x80000000 size 0x200000 attributes 0x0flags 0x1
For TDX guest, it uses MAPGPA to maps the range [0x8000 0000,+0x0x200000] to shared. The call of KVM_PRE_FAULT_MEMORY on suchrange leads to the TD being marked as bugged
[353467.266761] WARNING: CPU: 109 PID: 295970 at arch/x86/kvm/mmu/tdp_mmu.c:674 tdp_mmu_map_handle_target_level+0x301/0x460 [kvm]
It turns out to be a KVM bug.
The gpa passed in in KVM_PRE_FAULT_MEMORY, i.e., range->gpa has noindication for share vs. private. KVM directly passes range->gpa tokvm_tdp_map_page() in kvm_arch_vcpu_pre_fault_memory(), which is thenassigned to fault.addr
However, fault.addr is supposed to be a gpa of real access in TDXguest, which means it needs to have shared bit set if the map is forshared access, for TDX case. tdp_mmu_get_root_for_fault() will use itto determine which root to be used.
For this case, the pre fault is on the shared memory, while thefault.addr leads to mirror_root which is for private memory. Thus ittriggers KVM_BUG_ON().
So this would fix it?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7b3f1783ab3c..66f96476fade 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4895,6 +4895,7 @@ long kvm_arch_vcpu_pre_fault_memory(structkvm_vcpu *vcpu,
  {
      u64 error_code = PFERR_GUEST_FINAL_MASK;
      u8 level = PG_LEVEL_4K;
+    u64 direct_bits;
      u64 end;
      int r;
@@ -4909,15 +4910,18 @@ long kvm_arch_vcpu_pre_fault_memory(structkvm_vcpu *vcpu,
      if (r)
          return r;

+    direct_bits = 0;
      if (kvm_arch_has_private_mem(vcpu->kvm) &&
          kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa)))
          error_code |= PFERR_PRIVATE_ACCESS;
+    else
+        direct_bits = kvm_gfn_direct_bits(vcpu->kvm);


should be

        direct_bits = gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm));


      /*
       * Shadow paging uses GVA for kvm page fault, so restrict to
       * two-dimensional paging.
       */
-    r = kvm_tdp_map_page(vcpu, range->gpa, error_code, &level);

+ r = kvm_tdp_map_page(vcpu, range->gpa | direct_bits, error_code,&level);

      if (r < 0)
          return r;



I'm applying Tom's patch to get it out of his queue, but will delay sending
a pull request until the Linux-side fix is accepted.


With above mentioned change, it can fix the issue.

Me synced with Yan offline, and our fix is:

diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 52acf99d40a0..209103bf0f30 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h

@@ -48,7 +48,7 @@ static inline enum kvm_tdp_mmu_root_typeskvm_gfn_range_filter_to_root_types(strstatic inline struct kvm_mmu_page *tdp_mmu_get_root_for_fault(structkvm_vcpu *vcpu,structkvm_page_fault *fault)

 {
-       if (unlikely(!kvm_is_addr_direct(vcpu->kvm, fault->addr)))
+       if (unlikely(fault->is_private))
                return root_to_sp(vcpu->arch.mmu->mirror_root_hpa);

        return root_to_sp(vcpu->arch.mmu->root.hpa);

Paolo

Re: [PATCH] i386/kvm: Prefault memory on page state change

Reply via email to