Le lundi, mai 6, 2019 7:07 PM, Andrew Cooper <andrew.coop...@citrix.com> a
écrit :
> There is a lot in here.
I wanted to gather enough data before making a bug report on such a complicated
issue.
> As for your BSOD analysis, the first thing to be aware of is that Double
> Fault is not necessarily precise, which means you can't necessarily trust any
> of the registers. That said, most double faults are precise in practice, so
> if you're seeing it reliably at the same place, then it is likely to be a
> precise example.
I can reliably reproduce the Double Fault after ~10 tests on Windows 10 with
KPTI.
And the stacktrace always show the beginning of KiSystemCall64ShadowCommon,
which is executed after the CR3 switch to the kernel page tables.
> Your faulting address isn't the immediately after the pagetable switch. It
> is one instruction further on, after the stack switch, which means at the
> very minimum that reading the new rsp out of the per-processor storage
> succeeded.
>
> The stack switch, combined with `push $0x2b` faulting is a clear sign that
> the stack is bad. As the stack pointer looks plausible, it is almost
> certainly the pagewalk from %rsp which is bad. Judging by the Windbg guide,
> you want to use !pte to dump the pagewalk (but I have never used it in anger
> before).
I checked RSP, and it's mapped in the kernel page tables:
# print kernel and userland page directory physical address
kd> dt _EPROCESS ffffdf8815e15340 ImageFileName Pcb.Directorytablebase
Pcb.Userdirectorytablebase
ntdll!_EPROCESS
+0x000 Pcb :
+0x028 DirectoryTableBase : 0xcbf10002
+0x278 UserDirectoryTableBase : 0xcbe00001
+0x450 ImageFileName : [15] "ctfmon.exe"
# print RSP
kd> r rsp
rsp=fffff800b006cd08
# translate RSP to physical address
kd> !vtop cbf10000 fffff800b006cd08
Amd64VtoP: Virt fffff800b006cd08, pagedir 00000000cbf10000
Amd64VtoP: PML4E 00000000cbf10f80
Amd64VtoP: PDPE 0000000003708010
Amd64VtoP: PDE 0000000003709c00
Amd64VtoP: PTE 000000000371d360
Amd64VtoP: Mapped phys 000000000546cd08
Virtual address fffff800b006cd08 translates to physical address 546cd08.
> Given how many EPT flushing bugs I've already found in this area, I wouldn't
> be surprised if there are further ones lurking. If it is an EPT flushing
> bug, this delta should make it go away, but it will come with a hefty perf
> hit.
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 283eb7b..019333d 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs
> *regs)
> }
> }
>
> - if ( inv )
> - __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : INVEPT_ALL_CONTEXT,
> - inv == 1 ? single->eptp : 0);
> + __invept(INVEPT_ALL_CONTEXT, 0);
> }
>
> out:
I can give this a try, and see if it resolves the problem !
Thanks Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel