On 22/05/2020 10:45, Andrew Cooper wrote:
> On 21/05/2020 22:43, Igor Druzhinin wrote:
>> If a recalculation NPT fault hasn't been handled explicitly in
>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>> US bit has been re-instated in PTE and any real fault would be correctly
>> re-raised next time.
>>
>> This covers a specific case of migration with vGPU assigned on AMD:
>> global log-dirty is enabled and causes immediate recalculation NPT
>> fault in MMIO area upon access. This type of fault isn't described
>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>> EPT misconfig exit on Intel) which results in domain crash.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>> index 46a1aac..f0d0bd3 100644
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>          /* inject #VMEXIT(NPF) into guest. */
>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>          return;
>> +    case 0:
>> +        /* If a recalculation page fault hasn't been handled - just retry. 
>> */
>> +        if ( pfec & PFEC_user_mode )
>> +            return;
> 
> This smells like it is a recipe for livelocks.
> 
> Everything should have been handled properly by the call to
> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
> 
> It is legitimate for the MMIO mapping to end up being transiently
> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
> fix it up suggests that the bug is there.
> 
> Do you have the complete NPT walk to the bad mapping? Do we have
> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?

It does fix it up. The problem is that currently in SVM we enter
svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
is finished finished. 

Yes, we don't have _PAGE_USER initially and, yes, it's fixed up
correctly in p2m_pt_handle_deferred_changes but svm_do_nested_pgfault
doesn't know about it.

Please read my second email about alternatives that suggest to resolve
the issue you're worrying about.

Igor

Reply via email to