On 11/09/18 14:10, Jan Beulich wrote:
> Emulation requiring device model assistance uses a form of instruction
> re-execution, assuming that the second (and any further) pass takes
> exactly the same path. This is a valid assumption as far as use of CPU
> registers goes (as those can't change without any other instruction
> executing in between), but is wrong for memory accesses. In particular
> it has been observed that Windows might page out buffers underneath
> an instruction currently under emulation (hitting between two passes).
> If the first pass translated a linear address successfully, any subsequent
> pass needs to do so too, yielding the exact same translation.
>
> Introduce a cache (used just by guest page table accesses for now, i.e.
> a form of "paging structure cache") to make sure above described
> assumption holds. This is a very simplistic implementation for now: Only
> exact matches are satisfied (no overlaps or partial reads or anything).
>
> There's also some seemingly unrelated cleanup here which was found
> desirable on the way.
>
> 1: x86/mm: add optional cache to GLA->GFN translation
> 2: x86/mm: use optional cache in guest_walk_tables()
> 3: x86/HVM: implement memory read caching
> 4: x86/HVM: prefill cache with PDPTEs when possible
>
> "VMX: correct PDPTE load checks" is omitted from v2, as I can't
> currently find enough time to carry out the requested further
> rework.

Following the x86 call, I've had some thoughts and suggestions about how
to make this work in a reasonable way, without resorting to the full
caching approach.

First and foremost, I'd like recommend against trying to combine the fix
for repeated PDPTR reading, and repeated PTE reading.  While they are
both repeated reading problems, one really is a knoblly corner case of
32bit PAE paging, and one is a general emulation problem.  Fixing these
problems independently makes the result rather more simple, and far
closer to how real CPUs work.

For naming, how about "access once" in place of cache?  This is the best
description of the purpose I can come up with.

Next, there should be a single hvmemul_read_once(gaddr, bytes, ...)
(name subject to improvement), which does a transparent read-through of
the "access once cache" in terms of a single flag guest physical address
space.  This allows individual callers to opt into using the access-once
semantics, and doesn't hoist them with the substantial boilerplate of
the sole correct way to use this interface.  Furthermore, this behaviour
has the same semantics as the correct longer term fix.

That alone should fix the windows issue, because there is no chance that
windows will ever page out the PDPTRs.

For the PDPTRs, this corner case is special, and should be handled by
the pagewalk code.  I'm still going to go with my previous suggestion of
having top_map point onto the caller stack.  For the VT-x case, the
values can be pulled straight out of the VMCS, while for AMD, the values
can be read through the "access once cache", which matches the behaviour
of being read from memory, but ensures they won't be repeatedly read.

Overall, I think this should be fairly architecturally clean, solve the
underlying bug, and moves things in the general direction of the longer
term goal, even if it doesn't get all the way there in one step.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to