Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Mathieu Tarral Tue, 11 Jun 2019 05:33:30 -0700

Hi Andrew, Tamas,

Le mercredi, mai 29, 2019 2:49 AM, Andrew Cooper <andrew.coop...@citrix.com> a 
écrit :


> After a series of tests on 1 or 4 VCPUs, my domain end up in 2 possible 
> states: - frozen: the mouse doesn't move: so I would guess the VCPU are 
> blocked. I'm calling the xc_(un)pause_domain APIs multiple times when I write 
> to the shadow copies, but It's always synchronous, so I doubt that they 
> interfered and "paused" the domain.
>
> xc_{,un}pause_domain() calls are reference counted.  Calling unpause too many 
> times should be safe (from a refcount point of view), and should fail the 
> hypercall with -EINVAL.
>
>> Also, the log output I have before I detect that Ansible failed to execute 
>> is that the resume succeded and
>> Xen is ready to process VMI events.
>>
>> - BSOD: that's the second possibility, apparently I'm corrupting critical 
>> data structure in the operating system,
>> and the Windbg analysis is inconclusive, so I can't tell much.
>>
>> Either way, I can't execute this test sequentially 10 000 times without a 
>> crash.
>
> Ok good - this is a far easier place to start debugging from.
>
>> -> Could you look at the implementation, and tell me if I misused the APIs 
>> somewhere ?
>> https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7
>
> Some observations.
>
> 1)  In xen_pause_vm(), you do an xc_domain_getinfo().  First of all, the API 
> is crazy, so you also need to check "|| info.domid != domid" in your error 
> condition, but that shouldn't be an issue here as the domid isn't changing.

"The API is crazy"
Could you elaborate on that ?

> Furthermore, the results of xc_domain_getinfo() are stale before the 
> hypercall returns, so it is far less useful than it appears.

I looked at libvmi's implementation:
https://github.com/libvmi/libvmi/blob/master/libvmi/driver/xen/xen.c#L2696

I guess I should have implemented the checks too.
They just didn't make sense for me as I was sure that my calls were 
synchronous, one after the other.

> I'm afraid that the only safe way to issue pause/unpauses is to know that 
> you've reference counted your own correctly.  All entities in dom0 with 
> privilege can fight over each others references, because there is nothing Xen 
> can use to distinguish the requests.

-> I removed my calls to xc_pause/resume.

> 2) You allocate a libxl context but do nothing with it.  That can all go, 
> along with the linkage against libxl.  Also, you don't need to create a 
> logger like that.  Despite being utterly unacceptable behaviour for a 
> library, it is the default by passing NULL in xc_interface_open().

I followed drakvuf's xen init function:
https://github.com/tklengyel/drakvuf/blob/master/src/xen_helper/xen_helper.c#L140
As I thought I was going to need this at some point.
Same for the xl_logger initialization.

> 3) A malloc()/memset() pair is more commonly spelt calloc()

True.

> And some questions.
>
> 1) I'm guessing the drakvuf_inject_trap(drakvuf, 0x293e6a0, 0) call is 
> specific to the exact windows kernel in use?

Yes, I used a libvmi python script to tanslate the symbol -> virtual address -> 
physical address.
Then I replaced that value in my code and recompiled the binary before the test.

> 2) In vmi_init(), what is the purpose of fmask and zero_page_gfn?  You add 
> one extra gfn to the guest, called zero_page, and fill it with 1's from fmask.
>
> 3) You create two altp2m's, but both have the same default access.  Is this 
> deliberate, or a bug?  If deliberate, why?
>
> Finally, and probably the source of the memory corruption...
>
> 4) When injecting a trap, you allocate a new gfn, memcpy() the contents and 
> insert a 0xcc (so far so good).  You then remap the executable view to point 
> at the new gfn with a breakpoint in (fine), and remap the readable view to 
> point at the zero_page, which is full of 1's (uh-oh).
>
> What is this final step trying to achieve?  It guarantees that patch-guard 
> will eventually notice and BSOD your VM for critical structure corruption.  
> The read-only view needs to point to the original gfn with only read 
> permissions, so when Windows reads the gfn back, it sees what it expects.  
> You also need to prohibit writes to either gfn so you can spot writes 
> (unlikely in this case but important for general introspection) so you can 
> propagate the change to both copies.

Yes I missed that PatchGuard would eventually check those shadow pages anyway.
I was already happy to see that my breakpoints were working, and I proceeded to 
the tests
hoping to have a quick reproduction of the bug.

I implemented a basic mem_access event on the restricting to --X only on the 
original GFN being remapped,
and switching to hostp2m and singlestepping to escape PatchGuard.

It works, but I end up in a situation where Xen fails at some point, because at 
~90 tests, it cannot populate the ring anymore:
INFO:root:==== test 92 ====
INFO:root:starting drakvuf
INFO:root:starting Ansible
INIT
xen_init_interface
xc_interface_open
create logger
allocating libxc context
init ring page
xc: error: Failed to populate ring pfn
(16 = Device or resource busy): Internal error
fail to enable monitoring: Device or resource busy
fail to init xen interface
CLOSE
Fail to init vmi

(I updated the Gist: 
https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7#file-xen-drakvuf-c)
What do you think happened ?
I have a call to xc_domain_setmaxmem with ~0, so it shouldn't happen ?
https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7#file-xen-drakvuf-c-L598

>> I used the compat APIs, like Drakvuf does.
>>
>> @Tamas, if you could check the traps implementation.
>>
>> You also have stress-test.py, which is the small test suite that I used, and
>> the screenshot showing the stdout preceding a test failure,
>> when Ansible couldn't contact WinRM service because the domain was frozen.
>>
>> Note: I stole some code from libvmi, to handle page read/write in Xen.
>>
>> PS: in the case where the domain is frozen, and I destroy the domain, a 
>> (null) entry will remain
>> in xl list, despite that my stress-test.py process is already dead.
>>
>> I have 4 of these entries in my xl list right now.
>
> That's almost certainly a reference not being dropped on a page.  Can you run 
> `xl debug-keys q` and paste the resulting analysis which will be visible in 
> `xl dmesg`?
>
> It is probably some missing cleanup in the altp2m code.

I just checked, and I had a few "xen-drakvuf" processes still running in the 
background.
When the main Python process raised an exception and terminated, they became 
attached to systemd.
Killing them removed the (null) domain entries.

Thanks,
Mathieu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

Reply via email to