Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Andrew Cooper Tue, 28 May 2019 17:51:05 -0700

On 28/05/2019 13:33, Mathieu Tarral wrote:
> Hi Andrew,
>
>>> The bug is still here, so we can exclude a microcode issue.
>> Good - that is one further angle excluded.  Always make sure you are
>> running with up-to-date microcode, but it looks like we back to
>> investigating a logical bug in libvmi or Xen.
>
> I reimplemented a small test, without the Drakvuf/Libvmi layers, that will 
> inject traps on one API in Windows (NtCreateUserProcess),
> in the same way that Drakvuf does.
>
> I did some quick testing yesterday, with a Python script that was repeatedly
> starting the binary to monitor the API, and at the same time starting Ansible
> to run "c:\Windows\system32\reg.exe /?" via WinRM, to trigger some process 
> creation.
>
> The traps are working, I see the software breakpoint hit, switching to the 
> default
> view for singlestepping, and switching back to the execution view, so that's 
> already good.
>
> After a series of tests on 1 or 4 VCPUs, my domain end up in 2 possible 
> states:
> - frozen: the mouse doesn't move: so I would guess the VCPU are blocked.
>
> I'm calling the xc_(un)pause_domain APIs multiple times when I write to the 
> shadow copies,
> but It's always synchronous, so I doubt that they interfered and "paused" the 
> domain.


xc_{,un}pause_domain() calls are reference counted.  Calling unpause too
many times should be safe (from a refcount point of view), and should
fail the hypercall with -EINVAL.

>
> Also, the log output I have before I detect that Ansible failed to execute is 
> that the resume succeded and
> Xen is ready to process VMI events.
>
> - BSOD: that's the second possibility, apparently I'm corrupting critical 
> data structure in the operating system,
> and the Windbg analysis is inconclusive, so I can't tell much.
>
> Either way, I can't execute this test sequentially 10 000 times without a 
> crash.

Ok good - this is a far easier place to start debugging from.

> -> Could you look at the implementation, and tell me if I misused the APIs 
> somewhere ?
> https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7

Some observations.

1)  In xen_pause_vm(), you do an xc_domain_getinfo().  First of all, the
API is crazy, so you also need to check "|| info.domid != domid" in your
error condition, but that shouldn't be an issue here as the domid isn't
changing. 

Furthermore, the results of xc_domain_getinfo() are stale before the
hypercall returns, so it is far less useful than it appears.

I'm afraid that the only safe way to issue pause/unpauses is to know
that you've reference counted your own correctly.  All entities in dom0
with privilege can fight over each others references, because there is
nothing Xen can use to distinguish the requests.

2) You allocate a libxl context but do nothing with it.  That can all
go, along with the linkage against libxl.  Also, you don't need to
create a logger like that.  Despite being utterly unacceptable behaviour
for a library, it is the default by passing NULL in xc_interface_open().

3) A malloc()/memset() pair is more commonly spelt calloc()


And some questions.

1) I'm guessing the drakvuf_inject_trap(drakvuf, 0x293e6a0, 0) call is
specific to the exact windows kernel in use?

2) In vmi_init(), what is the purpose of fmask and zero_page_gfn?  You
add one extra gfn to the guest, called zero_page, and fill it with 1's
from fmask.

3) You create two altp2m's, but both have the same default access.  Is
this deliberate, or a bug?  If deliberate, why?

Finally, and probably the source of the memory corruption...

4) When injecting a trap, you allocate a new gfn, memcpy() the contents
and insert a 0xcc (so far so good).  You then remap the executable view
to point at the new gfn with a breakpoint in (fine), and remap the
readable view to point at the zero_page, which is full of 1's (uh-oh).

What is this final step trying to achieve?  It guarantees that
patch-guard will eventually notice and BSOD your VM for critical
structure corruption.  The read-only view needs to point to the original
gfn with only read permissions, so when Windows reads the gfn back, it
sees what it expects.  You also need to prohibit writes to either gfn so
you can spot writes (unlikely in this case but important for general
introspection) so you can propagate the change to both copies.

>
> I used the compat APIs, like Drakvuf does.
>
> @Tamas, if you could check the traps implementation.
>
> You also have stress-test.py, which is the small test suite that I used, and
> the screenshot showing the stdout preceding a test failure,
> when Ansible couldn't contact WinRM service because the domain was frozen.
>
> Note: I stole some code from libvmi, to handle page read/write in Xen.
>
> PS: in the case where the domain is frozen, and I destroy the domain, a 
> (null) entry will remain
> in xl list, despite that my stress-test.py process is already dead.
>
> I have 4 of these entries in my xl list right now.

That's almost certainly a reference not being dropped on a page.  Can
you run `xl debug-keys q` and paste the resulting analysis which will be
visible in `xl dmesg`?

It is probably some missing cleanup in the altp2m code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Reply via email to