On 6/3/25 06:47, Xiaoyao Li wrote:
> On 6/3/2025 3:41 PM, Xiaoyao Li wrote:
>> On 3/29/2025 4:30 AM, Tom Lendacky wrote:
>>> A page state change is typically followed by an access of the page(s) and
>>> results in another VMEXIT in order to map the page into the nested page
>>> table. Depending on the size of page state change request, this can
>>> generate a number of additional VMEXITs. For example, under SNP, when
>>> Linux is utilizing lazy memory acceptance, memory is typically accepted in
>>> 4M chunks. A page state change request is submitted to mark the pages as
>>> private, followed by validation of the memory. Since the guest_memfd
>>> currently only supports 4K pages, each page validation will result in
>>> VMEXIT to map the page, resulting in 1024 additional exits.
>>>
>>> When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
>>> size of the page state change in order to pre-map the pages and avoid the
>>> additional VMEXITs. This helps speed up boot times.
>>
>> Unfortunately, it breaks TDX guest.
>>
>>    kvm_hc_map_gpa_range gpa 0x80000000 size 0x200000 attributes 0x0
>> flags 0x1
>>
>> For TDX guest, it uses MAPGPA to maps the range [0x8000 0000,
>> +0x0x200000] to shared. The call of KVM_PRE_FAULT_MEMORY on such range
>> leads to the TD being marked as bugged
>>
>> [353467.266761] WARNING: CPU: 109 PID: 295970 at arch/x86/kvm/mmu/
>> tdp_mmu.c:674 tdp_mmu_map_handle_target_level+0x301/0x460 [kvm]
> 
> It turns out to be a KVM bug.
> 
> The gpa passed in in KVM_PRE_FAULT_MEMORY, i.e., range->gpa has no
> indication for share vs. private. KVM directly passes range->gpa to
> kvm_tdp_map_page() in kvm_arch_vcpu_pre_fault_memory(), which is then
> assigned to fault.addr
> 
> However, fault.addr is supposed to be a gpa of real access in TDX guest,
> which means it needs to have shared bit set if the map is for shared
> access, for TDX case. tdp_mmu_get_root_for_fault() will use it to
> determine which root to be used.
> 
> For this case, the pre fault is on the shared memory, while the fault.addr
> leads to mirror_root which is for private memory. Thus it triggers
> KVM_BUG_ON().

Is this something that can be fixed in KVM (determine if the range is
private or shared) or does the call to KVM_PRE_FAULT_MEMORY require
modification in some way that works for both TDX and SNP?

Thanks,
Tom

> 
> 
>> [353472.621399] WARNING: CPU: 109 PID: 295970 at arch/x86/kvm/../../../
>> virt/kvm/kvm_main.c:4281 kvm_vcpu_pre_fault_memory+0x167/0x1a0 [kvm]
>>
>>
>> It seems the pre map on the non MR back'ed range has issue. But I'm
>> still debugging it to understand the root cause.
>>
>>
> 

Reply via email to