Le lundi, mai 6, 2019 7:07 PM, Andrew Cooper <andrew.coop...@citrix.com> a 
écrit :

> There is a lot in here.

I wanted to gather enough data before making a bug report on such a complicated 
issue.

> As for your BSOD analysis, the first thing to be aware of is that Double 
> Fault is not necessarily precise, which means you can't necessarily trust any 
> of the registers.  That said, most double faults are precise in practice, so 
> if you're seeing it reliably at the same place, then it is likely to be a 
> precise example.

I can reliably reproduce the Double Fault after ~10 tests on Windows 10 with 
KPTI.
And the stacktrace always show the beginning of KiSystemCall64ShadowCommon, 
which is executed after the CR3 switch to the kernel page tables.

> Your faulting address isn't the immediately after the pagetable switch.  It 
> is one instruction further on, after the stack switch, which means at the 
> very minimum that reading the new rsp out of the per-processor storage 
> succeeded.
>
> The stack switch, combined with `push $0x2b` faulting is a clear sign that 
> the stack is bad.  As the stack pointer looks plausible, it is almost 
> certainly the pagewalk from %rsp which is bad.  Judging by the Windbg guide, 
> you want to use !pte to dump the pagewalk (but I have never used it in anger 
> before).

I checked RSP, and it's mapped in the kernel page tables:
# print kernel and userland page directory physical address
kd> dt _EPROCESS ffffdf8815e15340 ImageFileName Pcb.Directorytablebase 
Pcb.Userdirectorytablebase
ntdll!_EPROCESS
   +0x000 Pcb                        :
      +0x028 DirectoryTableBase         : 0xcbf10002
      +0x278 UserDirectoryTableBase     : 0xcbe00001
   +0x450 ImageFileName              : [15]  "ctfmon.exe"

# print RSP
kd> r rsp
rsp=fffff800b006cd08

# translate RSP to physical address
kd> !vtop cbf10000 fffff800b006cd08
Amd64VtoP: Virt fffff800b006cd08, pagedir 00000000cbf10000
Amd64VtoP: PML4E 00000000cbf10f80
Amd64VtoP: PDPE 0000000003708010
Amd64VtoP: PDE 0000000003709c00
Amd64VtoP: PTE 000000000371d360
Amd64VtoP: Mapped phys 000000000546cd08
Virtual address fffff800b006cd08 translates to physical address 546cd08.

> Given how many EPT flushing bugs I've already found in this area, I wouldn't 
> be surprised if there are further ones lurking.  If it is an EPT flushing 
> bug, this delta should make it go away, but it will come with a hefty perf 
> hit.
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 283eb7b..019333d 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs 
> *regs)
>              }
>          }
>
> -        if ( inv )
> -            __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : INVEPT_ALL_CONTEXT,
> -                     inv == 1 ? single->eptp          : 0);
> +        __invept(INVEPT_ALL_CONTEXT, 0);
>      }
>
>   out:

I can give this a try, and see if it resolves the problem !

Thanks Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to