Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Andrew Cooper Mon, 06 May 2019 11:32:17 -0700

On 06/05/2019 18:41, Tamas K Lengyel wrote:
> Hi Andrew,
> thanks for helping brainstorming on this.
>
>> How exactly does DRAKVUF go about injecting silent breakpoints?  It 
>> obviously has to allocate a new gfn from somewhere to begin with.  Do the 
>> bifurcated frames end up in two different altp2ms, or one in the host p2m 
>> and one in an alternative?  Does #VE ever get used?
> I've posted a blog entry about it a while ago, it's still accurate:
> https://xenproject.org/2016/04/13/stealthy-monitoring-with-xen-altp2m.


Talking of, have we fixed the emulation of `sti`?  I don't recall any
changes, but given our aim to get the emulator complete, we should fix it.

> You can't add new frames to only some of the altp2m's - at least not
> with the current interfaces. All the shadow pages are added to the
> hostp2m and then in the altp2m the GFN is remapped to the mfn of the
> shadow page with an execute-only permissions.

Ah - of course.  gfns only make sense in the context of the hostp2m.

> This way the breakpoint
> can be written into the shadow-page and any attempt to read it can be
> safely handled on a per-vCPU base by switching it back to the hostp2m
> for the duration of a singlestep (with MTF). Setting up the shadow
> pages is only safe to do during the initial setup while the altp2m
> view is not used and the guest is paused. Once altp2m views are being
> used adding new pages to the hostp2m results in losing all altp2m
> settings. For the most part this limitation is not an issue because
> all supported use-cases add the breakpoints once during the initial
> setup and there are no breakpoints added later during runtime.

What do the host p2m permissions get set to?  How do you cope with
future reuse of the gfn for a different purpose later?

>
> We've noticed that trapping MOV-TO-CR3 with the latest version of
> Windows 10 has a lot of issues in terms of overhead when KPTI is used,
> so as a band-aid solution it can be disabled to improve performance
> (which Mathieu already did).

Meltdown isn't subtle with its perf problems...  What purpose are you
trapping %cr3 writes for?  Simply auditing the pagetables in use?  If
so, VT-x has (since forever, iirc) had the CR3 target list (of 4
entries) which Xen can use to whitelist "safe" %cr3 values, which bypass
the VMExit.  If all you care about is that the vcpu stays on known-good
pagetables, this interface could be plumbed up to include the kernel and
user pagetables, which will avoid all the vmexits from syscalls due to
meltdown.

Alternatively, in some copious free time, once I've got the CPUID/MSR
interface in a better state, we could fake up MSR_ARCH_CAPS.RDCL_NO so
the guest doesn't turn on its meltdown mitigations in the first place.

>> Given how many EPT flushing bugs I've already found in this area, I wouldn't 
>> be surprised if there are further ones lurking.  If it is an EPT flushing 
>> bug, this delta should make it go away, but it will come with a hefty perf 
>> hit.
> My understanding is that the VPID implementation in Xen is such that
> effectively all VMEXITs will trigger assignment of a new VPID to the
> vCPU - which is likely a performance issue in itself - so flushing the
> EPT is likely not going to make a difference. But it's worth a shot,
> maybe it does :)

Sadly, things are far more complicated than that.  For one, Intel still
owe me a comment/correction to that section of the SDM on INVLPG
emulation for guests.

Xen's use of ASIDs as a common concept started from the AMD side.  AMD
strictly only cache linear => host physical mappings, so after any
change to the p2m, an ASID tick will guarantee to get you a fully clean
TLB for future pagewalks to populate.

The same is not true for Intel.  VPID and EPT were introduced together,
and have several kinds of mappings which are cached.  The processor may
cache:
1) linear => gpa mappings (tagged with current VPID and PCID values, and
contain no information from EPT)
2) gpa => hpa mappings (tagged with the current EPTP, may contain other
data such as the SPP vector, doesn't contain any data from the guest
pagetables)
3) combined mappings which are linear => hpa mappings.

In particular, ticking the VPID after an EPT modification *does not*
invalidate the gpa=>hpa mappings, so the guest can continue to execute
using stale mappings.  This is why we've got the logic in
vmx_vmenter_helper() to calculate if an INVEPT instruction is necessary.

Hence my suggestion for identifying whether it is a real TLB flushing
issue, or a logical error elsewhere. :)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Reply via email to