Re: [Xen-devel] [PATCH] x86/HVM: fix hvmemul_rep_outs_set_context()

2017-11-23 Thread Razvan Cojocaru
On 11/23/2017 05:09 PM, Jan Beulich wrote:
> There were two issues with this function: Its use of
> hvmemul_do_pio_buffer() was wrong (the function deals only with
> individual port accesses, not repeated ones, i.e. passing it
> "*reps * bytes_per_rep" does not have the intended effect). And it
> could have processed a larger set of operations in one go than was
> probably intended (limited just by the size that xmalloc() can hand
> back).
> 
> By converting to proper use of hvmemul_do_pio_buffer(), no intermediate
> buffer is needed at all. As a result a preemption check is being added.
> 
> Also drop unused parameters from the function.
> 
> Signed-off-by: Jan Beulich 

Thank you for the patch!

FWIW, Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/hvm: Add MSR old value

2017-12-04 Thread Razvan Cojocaru
> On Fri, Oct 13, 2017 at 03:50:57PM +0300, Alexandru Isaila wrote:
>> This patch adds the old value param and the onchangeonly option
>> to the VM_EVENT_REASON_MOV_TO_MSR event.
>>
>> The param was added to the vm_event_mov_to_msr struct and to the
>> hvm_monitor_msr function. Finally I've changed the bool_t param
>> to a bool for the hvm_msr_write_intercept function.
>>
>> Signed-off-by: Alexandru Isaila 
>> Acked-by: Tamas K Lengyel 
>>
>> ---
>> Changes since V1:
>>  - Removed Stray blanks inside the inner parentheses
>>  - Added space after the if statement
>>  - Added * 8 to the set/clear/test_bit statements
>>  - Removed the blank line after monitored_msr.
>> ---
>>  tools/libxc/include/xenctrl.h |  2 +-
>>  tools/libxc/xc_monitor.c  |  3 ++-
> 
> Acked-by: Wei Liu 

Ping - AFAICT this patch has all the required acks?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 22/25] x86/HVM: do actual CMPXCHG in hvmemul_cmpxchg()

2017-12-07 Thread Razvan Cojocaru
On 12/07/2017 04:16 PM, Jan Beulich wrote:
> ..., at least as far as currently possible, i.e. when a mapping can be
> obtained.
> 
> Signed-off-by: Jan Beulich 

Thank you for the patch!

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages

2017-12-08 Thread Razvan Cojocaru
On 12/08/2017 02:18 PM, Jan Beulich wrote:
 On 24.10.17 at 12:19,  wrote:
>> HVMOP_altp2m_set_mem_access_multi has been added as a HVMOP (as opposed to a
>> DOMCTL) for consistency with its HVMOP_altp2m_set_mem_access counterpart (and
>> hence with the original altp2m design, where domains are allowed - with the
>> proper altp2m access rights - to alter these settings), in the absence of an
>> official position on the issue from the original altp2m designers.
> 
> I continue to disagree with this reasoning. I'm afraid I'm not really
> willing to allow widening the badness, unless altp2m was formally
> documented security-unsupported.

Going the DOMCTL route here would have been the (much easier) solution,
and in fact, as stated before, there has been an attempt to do so -
however, IIRC Andrew has insisted that we should take care to use
consistent access privilege across altp2m operations.

This was followed by a lengthy xen-devel discussion and several
unsuccessful attempts to obtain an official position from the original
contributors, at which point (after several months), as also discussed
at the Xen Developer Summit in Budapest, we decided to press on in the
direction that had seemed the most compatible with the original altp2m
design. (Please correct me if I'm misremembering or misunderstanding
something.)

So at this point it looks like we're stuck again: we're happy to go in
any direction the maintainers decide is the best, but we do need to
decide on one.

FWIW, Tamas (CC added) has added code to restrict where altp2m calls can
come from (although that's not XSM code).

Please let us know how to proceed.

>> +int xc_altp2m_set_mem_access_multi(xc_interface *xch, uint32_t domid,
>> +   uint16_t view_id, uint8_t *access,
>> +   uint64_t *pages, uint32_t nr)
>> +{
>> +int rc;
>> +
>> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg);
>> +DECLARE_HYPERCALL_BOUNCE(access, nr, XC_HYPERCALL_BUFFER_BOUNCE_IN);
>> +DECLARE_HYPERCALL_BOUNCE(pages, nr * sizeof(uint64_t),
>> + XC_HYPERCALL_BUFFER_BOUNCE_IN);
> 
> This is confusing: Why "* sizeof()" in the latter expression, but not
> in the former. It would anyway be better to use sizeof(*pages) (and
> then also sizeof(*access)) to clearly make the connection.

I've opted for less clutter here, since of course sizeof(uint8_t) == 1,
so nr == nr * sizeof(uint8_t). But we're happy to make the change, it
will indeed make the code clearer.

>> @@ -4568,6 +4571,37 @@ static int do_altp2m_op(
>>  a.u.set_mem_access.view);
>>  break;
>>  
>> +case HVMOP_altp2m_set_mem_access_multi:
>> +if ( a.u.set_mem_access_multi.pad ||
>> + a.u.set_mem_access_multi.opaque >= a.u.set_mem_access_multi.nr 
>> )
>> +{
>> +rc = -EINVAL;
>> +break;
>> +}
> 
> Just like in a number of other recent cases: This operation should not
> fail when .nr is zero. Yet as per above it will.

We'll test that .nr != 0 before the >= comparison.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages

2017-12-11 Thread Razvan Cojocaru
On 12/11/2017 03:36 PM, Jan Beulich wrote:
 On 11.12.17 at 13:50,  wrote:
>> On 12/11/2017 12:12 PM, Jan Beulich wrote:
>> On 11.12.17 at 12:06,  wrote:
 My suggestion was that we don't break usecases.  The Intel usecase
 specifically is for an in-guest entity to have full control of all
 altp2m functionality, and this is fine (security wise) when permitted to
 do so by the toolstack.
>>>
>>> IOW you mean that such guests would be considered "trusted", i.e.
>>> whatever bad they can do is by definition not a security concern.
>>
>> I'm not sure what you mean by "trusted".  If implemented correctly,
>> altp2m and mem_access shouldn't give the guest any more permissions than
>> it has already.  The main risk would be if there were bugs in the
>> functionality that allowed security issues.
> 
> Hmm, maybe I'm mis-reading the code, but
> mem_access.c:set_mem_access() looks to be using the requested
> access rights verbatim, i.e. without applying tool stack imposed
> restrictions (hypervisor ones look to be honored by deriving
> base permissions from the p2m type first).

Quite likely I'm not grasping the full meaning of your objection,
however the added code is merely another interface to already existing
core code - so while admittedly there's room for improvement for the EPT
code below it, this patch really only extends the scope of altp2m's
existing version of set_mem_access() (which currently works on a single
page). In that, it at least doesn't seem to make things worse (it's
really just an optimization - whatever badness this code can cause with
a single call, can already be achieved exactly with a sequence of
xc_altp2m_set_mem_access() calls).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages

2017-12-11 Thread Razvan Cojocaru
On 12/11/2017 05:54 PM, George Dunlap wrote:
> On 12/11/2017 03:05 PM, Jan Beulich wrote:
>>>>> On 11.12.17 at 15:51,  wrote:
>>> On 12/11/2017 02:46 PM, Razvan Cojocaru wrote:
>>>> On 12/11/2017 03:36 PM, Jan Beulich wrote:
>>>>>>>> On 11.12.17 at 13:50,  wrote:
>>>>>> On 12/11/2017 12:12 PM, Jan Beulich wrote:
>>>>>>>>>> On 11.12.17 at 12:06,  wrote:
>>>>>>>> My suggestion was that we don't break usecases.  The Intel usecase
>>>>>>>> specifically is for an in-guest entity to have full control of all
>>>>>>>> altp2m functionality, and this is fine (security wise) when permitted 
>>>>>>>> to
>>>>>>>> do so by the toolstack.
>>>>>>>
>>>>>>> IOW you mean that such guests would be considered "trusted", i.e.
>>>>>>> whatever bad they can do is by definition not a security concern.
>>>>>>
>>>>>> I'm not sure what you mean by "trusted".  If implemented correctly,
>>>>>> altp2m and mem_access shouldn't give the guest any more permissions than
>>>>>> it has already.  The main risk would be if there were bugs in the
>>>>>> functionality that allowed security issues.
>>>>>
>>>>> Hmm, maybe I'm mis-reading the code, but
>>>>> mem_access.c:set_mem_access() looks to be using the requested
>>>>> access rights verbatim, i.e. without applying tool stack imposed
>>>>> restrictions (hypervisor ones look to be honored by deriving
>>>>> base permissions from the p2m type first).
>>>>
>>>> Quite likely I'm not grasping the full meaning of your objection,
>>>> however the added code is merely another interface to already existing
>>>> core code - so while admittedly there's room for improvement for the EPT
>>>> code below it, this patch really only extends the scope of altp2m's
>>>> existing version of set_mem_access() (which currently works on a single
>>>> page). In that, it at least doesn't seem to make things worse (it's
>>>> really just an optimization - whatever badness this code can cause with
>>>> a single call, can already be achieved exactly with a sequence of
>>>> xc_altp2m_set_mem_access() calls).
>>>
>>> I think Jan was saying that he would ideally like to remove *all* guest
>>> access to altp2m functionality, even what's currently there.  The more
>>> extra features we make available to guests, the harder it will be in the
>>> future to argue to remove it all.
>>
>> With one slight correction: all _uncontrolled_ access is what I'd like
>> to see removed. Right now this could arguably indeed mean all
>> access, as it is all uncontrolled (afaict).
> 
> Well at the moment all guest altp2m functionality is disabled unless the
> toolstack has set the appropriate HVM param.  Is that not sufficient?

Furthermore, the parameters allow setting dom0-access only (courtesy of
Tamas' aforementioned patches).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages

2017-12-11 Thread Razvan Cojocaru
On 12/11/2017 06:00 PM, Tamas K Lengyel wrote:
>>> I think Jan was saying that he would ideally like to remove *all* guest
>>> access to altp2m functionality, even what's currently there.  The more
>>> extra features we make available to guests, the harder it will be in the
>>> future to argue to remove it all.
>>
>> With one slight correction: all _uncontrolled_ access is what I'd like
>> to see removed. Right now this could arguably indeed mean all
>> access, as it is all uncontrolled (afaict).
>>
> 
> But it is controlled. Unless you specifically allow the guest access
> to the interface (ie altp2m=1 in the xl config) the guest can't do
> anything with it. And if you do that, it is likely because you have an
> in-guest agent that works in tandem with an out-of-guest agent
> coordinating what to protect and how. You use the in-guest agent for
> performance-sensitive monitoring while you can use the out-of-guest
> agent to protect the in-guest agent. Of course, this is not a
> requirement but what I *think* the setup was that the interface was
> designed for as there is specific ability to decide which page
> permission violation goes to the guest (with #VE) and which goes to
> the toolstack. Plus even if the interface is enabled, it is only
> available to the guest kernel. If it's a malicious guest kernel the
> worst it should be able to do is crash itself (for example by
> allocating a ton of altp2m tables and running out of shadow memory).
> But I don't think you need the altp2m interface for a guest kernel to
> crash itself.

That's precisely how we envision possible use cases as well.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate

2019-09-11 Thread Razvan Cojocaru
On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote:
> 
> 
> On 11.09.2019 12:57, Jan Beulich wrote:
>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote:
>>> +/*
>>> + * Send memory access vm_events based on pfec. Returns true if the event 
>>> was
>>> + * sent and false for p2m_get_mem_access() error, no violation and event 
>>> send
>>> + * error. Assumes the caller will check arch.vm_event->send_event.
>>> + *
>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the 
>>> EPT
>>> + * (in which case access to it is unrestricted, so no violations can 
>>> occur).
>>> + * In this cases it is fine to continue the emulation.
>>> + */
>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>> +   uint16_t kind)
>>
>> Why did you choose to have "ept" in the name and also mention it
>> in the commit? Is there anything in here which isn't generic p2m?
> 
> The name was suggested by Razvan Cojocaru. I have no preference in the
> name. Maybe Tamas can suggest a good one.

I've suggested "ept" in the name because "regular" emulation ignores it, 
and this function takes it into account, hence the "check_ept" (which I 
thought would be read together). It's fine to change it to something else.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate

2019-09-11 Thread Razvan Cojocaru
On 9/11/19 2:41 PM, Jan Beulich wrote:
> On 11.09.2019 13:21, Razvan Cojocaru wrote:
>> On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote:
>>>
>>>
>>> On 11.09.2019 12:57, Jan Beulich wrote:
>>>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote:
>>>>> +/*
>>>>> + * Send memory access vm_events based on pfec. Returns true if the event 
>>>>> was
>>>>> + * sent and false for p2m_get_mem_access() error, no violation and event 
>>>>> send
>>>>> + * error. Assumes the caller will check arch.vm_event->send_event.
>>>>> + *
>>>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the 
>>>>> EPT
>>>>> + * (in which case access to it is unrestricted, so no violations can 
>>>>> occur).
>>>>> + * In this cases it is fine to continue the emulation.
>>>>> + */
>>>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>>>> +   uint16_t kind)
>>>>
>>>> Why did you choose to have "ept" in the name and also mention it
>>>> in the commit? Is there anything in here which isn't generic p2m?
>>>
>>> The name was suggested by Razvan Cojocaru. I have no preference in the
>>> name. Maybe Tamas can suggest a good one.
>>
>> I've suggested "ept" in the name because "regular" emulation ignores it,
>> and this function takes it into account, hence the "check_ept" (which I
>> thought would be read together). It's fine to change it to something else.
> 
> Then "check_p2m" perhaps?

Sounds good to me.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate

2019-09-17 Thread Razvan Cojocaru
On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote:
> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
> +   uint16_t kind)
> +{
> +xenmem_access_t access;
> +vm_event_request_t req = {};
> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
> +
> +ASSERT(current->arch.vm_event->send_event);
> +
> +current->arch.vm_event->send_event = false;
> +
> +if ( p2m_get_mem_access(current->domain, gfn, &access,
> +altp2m_vcpu_idx(current)) != 0 )
> +return false;
 ... next to the call here (but the maintainers of the file would
 have to judge in the end). That said, I continue to not understand
 why a not found entry means unrestricted access. Isn't it
 ->default_access which controls what such a "virtual" entry would
 permit?
>>> I'm sorry for this misleading comment. The code states that if entry was
>>> not found the access will be default_access and return 0. So in this
>>> case the default_access will be checked.
>>>
>>> /* If request to get default access. */
>>> if ( gfn_eq(gfn, INVALID_GFN) )
>>> {
>>>*access = memaccess[p2m->default_access];
>>>return 0;
>>> }
>>>
>>> If this clears thing up I can remove the "NOTE" part if the comment.
>> I'm afraid it doesn't clear things up: I'm still lost as to why
>> "entry not found" implies "full access". And I'm further lost as
>> to what the code fragment above (dealing with INVALID_GFN, but
>> not really the "entry not found" case, which would be INVALID_MFN
>> coming back from a translation) is supposed to tell me.
>>
> It is safe enough to consider a invalid mfn from hostp2 to be a
> violation. There is still a small problem with having the altp2m view
> not having the page propagated from hostp2m. In this case we have to use
> altp2m_get_effective_entry().

In the absence of clear guidance from the Intel SDM on what the hardware 
default is for a page not present in the p2m, we should probably follow 
Jan's advice and check violations against default_access for such pages.

There are indeed - as discussed privately - two cases for an altp2m though:

1. Page not found in the altp2m, but set in the hostp2m - in which case 
I suggest that we take the hostp2m value into account (with or without 
propagation to the altp2m; I see no harm in propagating the entry but 
other may see something I'm missing).

2. Page not found in both altp2m and hostp2m - in which case we should 
probably check against default_access.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate

2019-09-17 Thread Razvan COJOCARU
On 9/17/19 6:09 PM, Tamas K Lengyel wrote:
> On Tue, Sep 17, 2019 at 8:24 AM Razvan Cojocaru
>  wrote:
>>
>> On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote:
>>>>>>> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>>>>>> +   uint16_t kind)
>>>>>>> +{
>>>>>>> +xenmem_access_t access;
>>>>>>> +vm_event_request_t req = {};
>>>>>>> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
>>>>>>> +
>>>>>>> +ASSERT(current->arch.vm_event->send_event);
>>>>>>> +
>>>>>>> +current->arch.vm_event->send_event = false;
>>>>>>> +
>>>>>>> +if ( p2m_get_mem_access(current->domain, gfn, &access,
>>>>>>> +altp2m_vcpu_idx(current)) != 0 )
>>>>>>> +return false;
>>>>>> ... next to the call here (but the maintainers of the file would
>>>>>> have to judge in the end). That said, I continue to not understand
>>>>>> why a not found entry means unrestricted access. Isn't it
>>>>>> ->default_access which controls what such a "virtual" entry would
>>>>>> permit?
>>>>> I'm sorry for this misleading comment. The code states that if entry was
>>>>> not found the access will be default_access and return 0. So in this
>>>>> case the default_access will be checked.
>>>>>
>>>>> /* If request to get default access. */
>>>>> if ( gfn_eq(gfn, INVALID_GFN) )
>>>>> {
>>>>>*access = memaccess[p2m->default_access];
>>>>>return 0;
>>>>> }
>>>>>
>>>>> If this clears thing up I can remove the "NOTE" part if the comment.
>>>> I'm afraid it doesn't clear things up: I'm still lost as to why
>>>> "entry not found" implies "full access". And I'm further lost as
>>>> to what the code fragment above (dealing with INVALID_GFN, but
>>>> not really the "entry not found" case, which would be INVALID_MFN
>>>> coming back from a translation) is supposed to tell me.
>>>>
>>> It is safe enough to consider a invalid mfn from hostp2 to be a
>>> violation. There is still a small problem with having the altp2m view
>>> not having the page propagated from hostp2m. In this case we have to use
>>> altp2m_get_effective_entry().
>>
>> In the absence of clear guidance from the Intel SDM on what the hardware
>> default is for a page not present in the p2m, we should probably follow
>> Jan's advice and check violations against default_access for such pages.
> 
> The SDM is very clear that pages that are not present in the EPT are a
> violation:
> 
> 28.2.2
> An EPT paging-structure entry is present if any of bits 2:0 is 1;
> otherwise, the entry is not present. The processor
> ignores bits 62:3 and uses the entry neither to reference another EPT
> paging-structure entry nor to produce a
> physical address. A reference using a guest-physical address whose
> translation encounters an EPT paging-struc-
> ture that is not present causes an EPT violation (see Section 28.2.3.2).
> 
> 28.2.3.2
> EPT Violations
> An EPT violation may occur during an access using a guest-physical
> address whose translation does not cause an
> EPT misconfiguration. An EPT violation occurs in any of the following
> situations:
> • Translation of the guest-physical address encounters an EPT
> paging-structure entry that is not present (see
> Section 28.2.2).

Mystery solved. Thanks!
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] SVM: correct CPUID event processing

2019-09-19 Thread Razvan COJOCARU
On 9/19/19 11:07 PM, Boris Ostrovsky wrote:
> On 9/19/19 6:37 AM, Jan Beulich wrote:
>> hvm_monitor_cpuid() expects the input registers, not two of the outputs.
>>
>> However, once having made the necessary adjustment, the SVM and VMX
>> functions are so similar that they should be folded (thus avoiding
>> further similar asymmetries to get introduced). Use the best of both
>> worlds by e.g. using "curr" consistently. This then being the only
>> caller of hvm_check_cpuid_faulting(), fold in that function as well.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Boris Ostrovsky 

Acked-by: Razvan Cojocaru 


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v13] x86/emulate: Send vm_event from emulate

2019-09-23 Thread Razvan COJOCARU
On 9/23/19 3:05 PM, Alexandru Stefan ISAILA wrote:
> A/D bit writes (on page walks) can be considered benign by an introspection
> agent, so receiving vm_events for them is a pessimization. We try here to
> optimize by filtering these events out.
> Currently, we are fully emulating the instruction at RIP when the hardware 
> sees
> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
> incorrect, because the instruction at RIP might legitimately cause an
> EPT fault of its own while accessing a_different_  page from the original one,
> where A/D were set.
> The solution is to perform the whole emulation, while ignoring EPT 
> restrictions
> for the walk part, and taking them into account for the "actual" emulating of
> the instruction at RIP. When we send out a vm_event, we don't want the 
> emulation
> to complete, since in that case we won't be able to veto whatever it is doing.
> That would mean that we can't actually prevent any malicious activity, instead
> we'd only be able to report on it.
> When we see a "send-vm_event" case while emulating, we need to first send the
> event out and then suspend the emulation (return X86EMUL_RETRY).
> After the emulation stops we'll call hvm_vm_event_do_resume() again after the
> introspection agent treats the event and resumes the guest. There, the
> instruction at RIP will be fully emulated (with the EPT ignored) if the
> introspection application allows it, and the guest will continue to run past
> the instruction.
> 
> A common example is if the hardware exits because of an EPT fault caused by a
> page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
> If the vm_event was sent and it would be treated so it runs the instruction
> at RIP, that instruction might also hit a protected page and provoke a 
> vm_event.
> 
> Now if npfec.kind == npfec_kind_in_gpt and 
> d->arch.monitor.inguest_pagefault_disabled
> is true then we are in the page walk case and we can do this emulation 
> optimization
> and emulate the page walk while ignoring the EPT, but don't ignore the EPT 
> for the
> emulation of the actual instruction.
> 
> In the first case we would have 2 EPT events, in the second case we would have
> 1 EPT event if the instruction at the RIP triggers an EPT event.
> 
> We use hvmemul_map_linear_addr() to intercept write access and
> __hvm_copy() to intercept exec, read and write access.
> 
> A new return type was added, HVMTRANS_need_retry, in order to have all
> the places that consume HVMTRANS* return X86EMUL_RETRY.
> 
> hvm_emulate_send_vm_event() can return false if there was no violation,
> if there was an error from monitor_traps() or p2m_get_mem_access().
> -ESRCH from p2m_get_mem_access() is treated as restricted access.
> 
> NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
> arch.vm_event->send_event
> 
> Signed-off-by: Alexandru Isaila

FWIW, Acked-by: Razvan Cojocaru 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value

2019-05-01 Thread Razvan Cojocaru
On 5/1/19 6:01 PM, Tamas K Lengyel wrote:
> On Wed, May 1, 2019 at 8:53 AM Tamas K Lengyel  wrote:
>>
>> On Wed, May 1, 2019 at 8:20 AM Razvan Cojocaru
>>  wrote:
>>>
>>> On 5/1/19 4:58 PM, Tamas K Lengyel wrote:
>>>>>> It might be worth introducing a "sync state from hw" hook which collects
>>>>>> all the data we intend to pass to the introspection agent.
>>>>>
>>>>> You mean adding another hvm hook?
>>>>
>>>> Actually, instead of another hook I think what would make sense it to
>>>> just update vmx_save_vmcs_ctxt to automatically refresh the cached
>>>> register values when it's called with "v == current". Thoughts?
>>>
>>> That's probably the better way to go about it, since otherwise the
>>> xc_hvm_getcontext_partial() hypercall will suffer from the same problem.
>>> (there are two ways of getting guest state: one is via the vm_event
>>> cached values, the other is via the aforementioned hypercall).
>>
>> True, although issuing the hypercall in the vm_event callback is
>> actually fine - that's how I found the issue to begin with, since the
>> vCPU will be scheduled out with the cached registers refreshed and
>> thus be different then what the vm_event itself had. But other callers
>> of the hypercall can run into the problem if the guest/vcpu is not
>> paused.
> 
> Actually, doing the "v == current" check wouldn't really do anything
> for the hypercall since it's not the domain issuing the hypercall on
> itself. The only way to ensure that hypercall is returning correct
> values would be to pause the vCPU.

I've discussed this with Andrew in the meantime and he's kindly pointed
out that for the hypercall we pause from remote context, which forces a
de-schedule. So the hypercall _should_ be fine. But we write data to the
vm_event ring from the current context, where state might actually be in
hardware.

He'll probably chime in with additional suggestions / comments.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-01 Thread Razvan Cojocaru
On 5/2/19 2:52 AM, Tamas K Lengyel wrote:
> Currently the gs_shadow value is only cached when the vCPU is being scheduled
> out by Xen. Reporting this (usually) stale value through vm_event is 
> incorrect,
> since it doesn't represent the actual state of the vCPU at the time the event
> was recorded. This prevents vm_event subscribers from correctly finding kernel
> structures in the guest when it is trapped while in ring3.
> 
> Refresh shadow_gs value when the context being saved is for the current vCPU.
> 
> Signed-off-by: Tamas K Lengyel 
> Cc: Razvan Cojocaru 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Roger Pau Monne 
> ---
> v2: move fix to hvm so vm_event doesn't have to know specifics
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 283eb7b34d..5154ecc2a8 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
> hvm_hw_cpu *data)
>  
>  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
>  {
> +if ( v == current )
> +vmx_save_guest_msrs(v);

vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).

But that's not technically an issue, so if nobody else minds:

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
  xen/arch/x86/vm_event.c   | 5 +
  xen/include/public/vm_event.h | 3 ++-
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum 
x86_segment segment,
  reg->es_sel = seg.sel;
  break;
  
+case x86_seg_gdtr:

+reg->gdtr_base = seg.base;
+break;


Please also add limit, ar, sel, like the others do. In addition to 
making this modification look less strange (since, in contrast to the 
function name, nothing is packed for gdtr_base), it will also save 
future work for applications wanting to use gdtr which also require 
backwards compatibility with previous vm_event versions.


As you know, for each such modification we need to have a separate 
vm_event_vX header in our applications so that we're ready to create a 
ring buffer using requests and replies of the right size, and also to be 
able to properly interpret the ring buffer data, so the least frequent 
changes to the vm_event struct, the better.


Petre is currently working on the vm_event changes that will hopefully 
enable us to just cache everything that the getcontext_partial hypercall 
retrieves.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 11:36 AM, Jan Beulich wrote:

On 02.05.19 at 08:20,  wrote:

On 5/2/19 2:52 AM, Tamas K Lengyel wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
hvm_hw_cpu *data)
  
  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)

  {
+if ( v == current )
+vmx_save_guest_msrs(v);


vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).


Avoiding an MSR access is perhaps worth the conditional.


Fair enough.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 11:45 AM, Andrew Cooper wrote:

On 02/05/2019 07:20, Razvan Cojocaru wrote:

On 5/2/19 2:52 AM, Tamas K Lengyel wrote:

Currently the gs_shadow value is only cached when the vCPU is being scheduled
out by Xen. Reporting this (usually) stale value through vm_event is incorrect,
since it doesn't represent the actual state of the vCPU at the time the event
was recorded. This prevents vm_event subscribers from correctly finding kernel
structures in the guest when it is trapped while in ring3.

Refresh shadow_gs value when the context being saved is for the current vCPU.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Jun Nakajima 
Cc: Kevin Tian 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
v2: move fix to hvm so vm_event doesn't have to know specifics
---
  xen/arch/x86/hvm/vmx/vmx.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 283eb7b34d..5154ecc2a8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
hvm_hw_cpu *data)
  
  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)

  {
+if ( v == current )
+vmx_save_guest_msrs(v);

vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).


Why?  Doing that would fully corrupt v's state when called in remote
context, as you'd clobber v's gs_shadow which whatever the value was
from the current vcpu.


Good point, I've missed that.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru



On 5/2/19 11:52 AM, Andrew Cooper wrote:

On 02/05/2019 09:25, Razvan Cojocaru wrote:

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows
when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
   xen/arch/x86/vm_event.c   | 5 +
   xen/include/public/vm_event.h | 3 ++-
   2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum
x86_segment segment,
   reg->es_sel = seg.sel;
   break;
   +    case x86_seg_gdtr:
+    reg->gdtr_base = seg.base;
+    break;


Please also add limit, ar, sel, like the others do.


In Xen, we model GDTR/IDTR just like all other segments in the segment
cache for convenience/consistency, including faking up of some default
attributes.

However, there is no such thing as a selector or access rights for them,
and the VMCS lacks the fields, while the VMCB marks the fields as
reserved.   It is almost certainly not worth wasting the space in the ring.


Right, I got carried away there: I saw gdtr_limit and didn't check that 
if the rest is available.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 4:09 PM, Tamas K Lengyel wrote:

On Thu, May 2, 2019 at 2:57 AM Jan Beulich  wrote:



On 02.05.19 at 10:25,  wrote:

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
   xen/arch/x86/vm_event.c   | 5 +
   xen/include/public/vm_event.h | 3 ++-
   2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum

x86_segment segment,

   reg->es_sel = seg.sel;
   break;

+case x86_seg_gdtr:
+reg->gdtr_base = seg.base;
+break;


Please also add limit, ar, sel, like the others do.


There's no ar or sel for GDT (and IDT). Instead, because of ...


In addition to
making this modification look less strange (since, in contrast to the
function name, nothing is packed for gdtr_base), it will also save
future work for applications wanting to use gdtr which also require
backwards compatibility with previous vm_event versions.

As you know, for each such modification we need to have a separate
vm_event_vX header in our applications so that we're ready to create a
ring buffer using requests and replies of the right size, and also to be
able to properly interpret the ring buffer data, so the least frequent
changes to the vm_event struct, the better.


... this I wonder whether the IDT details shouldn't get added at
the same time.


The churn of the header is not that big of a deal - I do the same in
LibVMI and it's a largely copy-paste job that takes perhaps ~5m to add
(see 
https://github.com/tklengyel/libvmi/commit/7509ce56d0408dbec4e374b8780da69a7bd212e8).
So that should not be a factor in deciding whether we add a needed
extra piece or not. Since the vm_event ABI is changing for Xen 4.13
now the ABI version will only be bumped once for 4.13. So we should be
able to add/remove/shuffle whatever we want while 4.13 merge window is
open.

That said I don't have a use for idt and gdtr_limit that warrants
having to receive it via the vm_event structure - those pieces are
always just a hypercall away if needed. So in the vm_event structure I
tend to just put the registers needed often enough to warrant avoiding
that extra hypercall. But if Razvan says he has a use for them, I can
add them here.


We do use them both - idtr and gdtr, both base and limit, but we are 
indeed getting them via hypercall now. Considering that, since we did 
add gdtr_base I think it would be great if we could also add gdtr_limit.


Adding idtr as well would _theoretically_ be a nice speed optimization 
(removing the need for a hypercall), but it might also be a speed 
pessimization generally applicable to increasing the vm_event size 
(since a VCPU with no more space left in the vm_event ring buffer will 
be blocked more and go through more locking logic).


My point is, at the moment it's fine to skip idtr if it's not required 
by Jan, but if we do add either then let's please add both _base and _limit.


In a hopefully near future we'll be able to use as much memory as 
necessary for storing vm_event data and just cache everything in the 
ring buffer, avoing all the "get context" hypercalls.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru
On 5/3/19 12:42 AM, Tamas K Lengyel wrote:
> Receiving this register is useful for introspecting 32-bit Windows when the
> event being trapped happened while in ring3.
> 
> Signed-off-by: Tamas K Lengyel 
> Cc: Razvan Cojocaru 
> Cc: Tamas K Lengyel 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Roger Pau Monne 
> ---
> v2: add gdtr limit

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-03 Thread Razvan Cojocaru

On 5/3/19 11:09 AM, Jan Beulich wrote:

On 03.05.19 at 00:54,  wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
v2: add gdtr limit
v3: use uint32_t to fit the 20 bits


As per Andrew's response I think v2 is it.


Yes, please. This will also allow us to reuse the existing (remaining) 
pad bits in the future for another limit (for idtr, perhaps).



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/9/19 11:57 PM, Mathieu Tarral wrote:
> Hi,
> 
> following a previous conversation, i would like to catch MOV-TO-DRx VMI 
> events to prevent the guest from disabling my hardware breakpoints.
> 
> @Tamas pointed me to this header:
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/vm_event.h;h=b2bafc0d77f9758e42b0d53c05a7e6bb86c86686;hb=HEAD#l154
> 
> And, as far as I can tell, I have to
> - add a new REASON
> #define VM_EVENT_REASON_WRITE_DEBUGREG  15
> 
> - add a new struct
> struct vm_event_write_debugreg {
> uint32_t index;
> uint32_t _pad;
> uint64_t new_value;
> uint64_t old_value;
> };
> 
> - insert it into the vm_event_st union
> 
> Can you give me more pointer and guidance how to implement this into Xen ?

You probably want to change the write_debugreg() macro into a function
that does what's currently being done + send out the vm_event. You also
probably need to think about whether you want the write to be
preemptable or not (I'm guessing you don't, in which case it's all simpler).

> I have never submitted a patch, nor looked into Xen source code.

https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches

> Should we create a single event for MOV-TO-REGx VMI events ?
> It would void copy pasting and duplicating code.

I don't understand this question.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/10/19 12:31 AM, Andrew Cooper wrote:
> What we'll have to do is end up in a position where we can have some
> real %dr settings given by the VMI agent, and some shadow %dr settings
> which the guest interacts with.  Also I should warn you at this point
> that, because of how the registers work, It will not be possible to have
> guest-shadowed %dr functioning at the same time as VMI-provided %dr
> settings.
> 
> I guess the main usecase here is simply hiding from the guest kernel
> that debugging activities are in use, and we are ok to break the real
> use of gdb/other inside the guest?  Razvan/Tamas: As your the
> maintainers, it is your call, ultimately.

What worries me here is that in that case it becomes easier for a rogue
application inside the guest to figure out that the guest's being
monitored, if I understand things correctly.

Of course, a dom0 introspection agent may choose to simply not subscribe
to DR events, and thus not alter the current flow at all, which makes
things better.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/10/19 2:03 AM, Tamas K Lengyel wrote:
>> Either way, everything comes down to what behaviour is wanted to start with.
> As I said, I think adding that monitoring capability is fine as long
> as its limitation is clearly documented.

Right. My thoughts as well.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Altp2m use with PML can deadlock Xen

2019-05-10 Thread Razvan Cojocaru

On 5/10/19 5:42 PM, Tamas K Lengyel wrote:

On Thu, May 9, 2019 at 10:19 AM Andrew Cooper  wrote:


On 09/05/2019 14:38, Tamas K Lengyel wrote:

Hi all,
I'm investigating an issue with altp2m that can easily be reproduced
and leads to a hypervisor deadlock when PML is available in hardware.
I haven't been able to trace down where the actual deadlock occurs.

The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer
that calls p2m_change_type_one on all gfns that were recorded the PML
buffer. The problem occurs when the PML buffer full vmexit happens
while the active p2m is an altp2m. Switching  p2m_change_type_one to
work with the altp2m instead of the hostp2m however results in EPT
misconfiguration crashes.

Adding to the issue is that it seem to only occur when the altp2m has
remapped GFNs. Since PML records entries based on GFN leads me to
question whether it is safe at all to use PML when altp2m is used with
GFN remapping. However, AFAICT the GFNs in the PML buffer are not the
remapped GFNs and my understanding is that it should be safe as long
as the GFNs being tracked by PML are never the remapped GFNs.

Booting Xen with ept=pml=0 resolves the issue.

If anyone has any insight into what might be happening, please let me know.



I could have sworn that George spotted a problem here and fixed it.  I
shouldn't be surprised if we have more.

The problem that PML introduced (and this is mostly my fault, as I
suggested the buggy solution) is that the vmexit handler from one vcpu
pauses others to drain the PML queue into the dirty bitmap.  Overall I
wasn't happy with the design and I've got some ideas to improve it, but
within the scope of how altp2m was engineered, I proposed
domain_pause_except_self().

As it turns out, that is vulnerable to deadlocks when you get two vcpus
trying to pause each other and waiting for each other to become
de-scheduled.


Makes sense.



I see this has been reused by the altp2m code, but it *should* be safe
to deadlocks now that it takes the hypercall_deadlock_mutext.


Is that already in staging or your x86-next branch? I would like to
verify that the problem is still present or not with that change. I
tested with Xen 4.12 release and that definitely still deadlocks.


I don't know if Andrew is talking about this patch (probably not, but it 
looks at least related):


http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=24d5282527f4647907b3572820b5335c15cd0356;hp=29d28b29190ba09d53ae7e475108def84e16e363

Since there's a "Release-acked" tag on it, I think it's in 4.12.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-13 Thread Razvan Cojocaru

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very much
for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in 
Xen 4.13-unstable today:


vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-13 Thread Razvan Cojocaru

On 5/13/19 5:06 PM, Jan Beulich wrote:

On 13.05.19 at 15:58,  wrote:

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very much
for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in
Xen 4.13-unstable today:

vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


You're kidding? This is already in 4.12.0, and if it weren't I'm sure
you're aware there are about 40 more AVX512 patches pending
review.


Right, I did indeed forget about the pending review part, for some 
reason I was sure they made it in. I've double-checked and we really are 
using 4.13-unstable - but we've also made changes to the emulator, 
working on the send-vm-events-from-the-emulator patch, so we'll revert 
to a pristine staging and retry, there's a chance this happens because 
of our changes.


We'll find out what's going on exactly.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

2019-05-13 Thread Razvan Cojocaru
On 5/13/19 7:18 PM, Mathieu Tarral wrote:
> Le vendredi, mai 10, 2019 5:21 PM, Andrew Cooper  
> a écrit :
> 
>> On 10/05/2019 16:17, Mathieu Tarral wrote:
>>
>>> Le jeudi, mai 9, 2019 6:42 PM, Andrew Cooper andrew.coop...@citrix.com a 
>>> écrit :
>>>
 Therefore, the conclusion to draw is that it is a logical bug somewhere.
>>> The bug is still here, so we can exclude a microcode issue.
>>
>> Good - that is one further angle excluded.  Always make sure you are
>> running with up-to-date microcode, but it looks like we back to
>> investigating a logical bug in libvmi or Xen.
> 
> I played with tool/tests/xen-access this afternoon.
> 
> The tool is working, i could intercept breakpoints, cpuid, write and exec mem 
> accesses, etc..
> 
> However, using altp2m related intercepts leads to a guest crash sometimes:
> 
> Windows 7 x64, 4 VCPUs
> - altp2m_write: crash
> - altp2m_exec: crash
> - altp2m_write_no_gpt: frozen
> 
> Windows 7 x64, 1 VCPU
> - altp2m_write: crash
> - altp2m_exec: OK
> - altp2m_write_no_gpt: frozen
> 
> "frozen" means that xen-access receives VMI events, bug the guest is frozen 
> until I decide to stop xen-access.
> I'm wondering what kind of exec events it received because they are not the 
> same, so it's not looping
> over the same RIP over and over. (?)
I think you're simply tripping some OS timer because you're slowing the
guest down in the crash case, and simply keep the guest too busy
handling events in the "freeze" case. Remember that there's quite a
delay running each offending instruction: one vm_event saying you've got
a violation, a reply saying "put this VCPU in single-step mode _and_
switch to the unrestricted EPT view", another vm_event saying
"instruction executed", followed by anoher reply saying "switch back to
the restricted EPT _and_ take the VCPU out of single-step mode".

Restricting the whole of the guest's memory (and so doing this dance for
_every_ instruction causing a fault) is practically guaranteed to upset
the OS. A little EPT restricting goes a long way.

Of course, if this could be improved so that even stress-tests (which is
basically what xen-access is) leave the guest running smoothly, that'd
be fantastic.


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-14 Thread Razvan Cojocaru

On 5/13/19 5:15 PM, Razvan Cojocaru wrote:

On 5/13/19 5:06 PM, Jan Beulich wrote:

On 13.05.19 at 15:58,  wrote:

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very 
much

for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in
Xen 4.13-unstable today:

vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


You're kidding? This is already in 4.12.0, and if it weren't I'm sure
you're aware there are about 40 more AVX512 patches pending
review.


Right, I did indeed forget about the pending review part, for some 
reason I was sure they made it in. I've double-checked and we really are 
using 4.13-unstable - but we've also made changes to the emulator, 
working on the send-vm-events-from-the-emulator patch, so we'll revert 
to a pristine staging and retry, there's a chance this happens because 
of our changes.


We'll find out what's going on exactly.


I promised I'd return with more details. After some debugging, it 
certainly looks like the emulator returns UNIMPLEMENTED (5):


Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 
05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d


Looking at the source code, the emulator does appear to support 
vpmaddwd, however only for EVEX:


http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging#l6696

whereas our fail case uses VEX.

This may be in the works in the aforementioned series, but is 
legitimately unsupported in 4.13 staging.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-14 Thread Razvan Cojocaru



On 5/14/19 5:16 PM, Jan Beulich wrote:

On 14.05.19 at 15:47,  wrote:

Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5
05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d

Looking at the source code, the emulator does appear to support
vpmaddwd, however only for EVEX:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x
86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging
#l6696

whereas our fail case uses VEX.

This may be in the works in the aforementioned series, but is
legitimately unsupported in 4.13 staging.


Hmm, interesting. The origin of the encoding is at MMX times,
which means it's more than just VPMADDWD that's missing, and
it's been an omission back in the MMX/SSE2 series then. That's
a genuine oversight, and in the light of this I'd like to apologize
for my unfriendly initial reaction. I'll see about getting this fixed.
(It would have helped if you had shared the encoding right away,
since the mnemonic and operands are now often insufficient.)


No problem at all. Indeed, sharing the encoding would have cleared 
things up faster.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM

2019-05-14 Thread Razvan Cojocaru
All its callers live inside #ifdef CONFIG_HVM sections.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/mm/p2m.c | 72 +++
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index cc6661e..57c5eed 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -478,42 +478,6 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
 mm_write_unlock(&p2m->lock);
 }
 
-int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
-   p2m_type_t *t, p2m_access_t *a,
-   bool prepopulate)
-{
-*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
-
-/* Check host p2m if no valid entry in alternate */
-if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
-{
-struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
-unsigned int page_order;
-int rc;
-
-*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
- P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
-
-rc = -ESRCH;
-if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
-return rc;
-
-/* If this is a superpage, copy that first */
-if ( prepopulate && page_order != PAGE_ORDER_4K )
-{
-unsigned long mask = ~((1UL << page_order) - 1);
-gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
-mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
-
-rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
-if ( rc )
-return rc;
-}
-}
-
-return 0;
-}
-
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
 unsigned int *page_order, bool_t locked)
@@ -2378,6 +2342,42 @@ int unmap_mmio_regions(struct domain *d,
 
 #ifdef CONFIG_HVM
 
+int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
+   p2m_type_t *t, p2m_access_t *a,
+   bool prepopulate)
+{
+*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
+
+/* Check host p2m if no valid entry in alternate */
+if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
+{
+struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
+unsigned int page_order;
+int rc;
+
+*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
+ P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
+
+rc = -ESRCH;
+if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
+return rc;
+
+/* If this is a superpage, copy that first */
+if ( prepopulate && page_order != PAGE_ORDER_4K )
+{
+unsigned long mask = ~((1UL << page_order) - 1);
+gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
+mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
+
+rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
+if ( rc )
+return rc;
+}
+}
+
+return 0;
+}
+
 void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
 {
 if ( altp2m_active(v->domain) )
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 2d0bda1..7e71bf0 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -514,6 +514,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, 
mfn_t mfn)
 return mfn_x(mfn);
 }
 
+#ifdef CONFIG_HVM
 #define AP2MGET_prepopulate true
 #define AP2MGET_query false
 
@@ -525,6 +526,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, 
mfn_t mfn)
 int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
p2m_type_t *t, p2m_access_t *a,
bool prepopulate);
+#endif
 
 /* Deadlock-avoidance scheme when calling get_gfn on different gfn's */
 struct two_gfns {
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6] x86/altp2m: Aggregate get entry and populate into common funcs

2019-05-14 Thread Razvan Cojocaru

On 5/14/19 6:42 PM, Jan Beulich wrote:

On 23.04.19 at 16:30,  wrote:

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -478,6 +478,43 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
  mm_write_unlock(&p2m->lock);
  }
  
+int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,

+   p2m_type_t *t, p2m_access_t *a,
+   bool prepopulate)
+{
+*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
+
+/* Check host p2m if no valid entry in alternate */
+if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
+{
+struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
+unsigned int page_order;
+int rc;
+
+*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
+ P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
+
+rc = -ESRCH;
+if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
+return rc;
+
+/* If this is a superpage, copy that first */
+if ( prepopulate && page_order != PAGE_ORDER_4K )
+{
+unsigned long mask = ~((1UL << page_order) - 1);
+gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
+mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
+
+rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
+if ( rc )
+return rc;
+}
+}
+
+return 0;
+}
+
+
  mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
  p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
  unsigned int *page_order, bool_t locked)


May I ask how the placement of this function was chosen? It looks
as if all its callers live inside #ifdef CONFIG_HVM sections, just this
function doesn't (reportedly causing build issues together with
later changes).


I believe it's just an oversight. I've sent out a patch that hopefully 
fixes this in a satisfactory manner.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/vm_event: fix rc check for uninitialized ring

2019-05-24 Thread Razvan Cojocaru

On 5/24/19 4:15 PM, Juergen Gross wrote:

vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring
since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event
lists on domain creation"), but the callers test for -ENOSYS.

Correct the callers.

Signed-off-by: Juergen Gross 
---
  xen/arch/x86/mm/p2m.c | 2 +-
  xen/common/monitor.c  | 2 +-
  xen/common/vm_event.c | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 57c5eeda91..8a9a1153a8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1705,7 +1705,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned 
long gfn_l)
  
  /* We're paging. There should be a ring */

  int rc = vm_event_claim_slot(d, d->vm_event_paging);
-if ( rc == -ENOSYS )
+if ( rc == -EOPNOTSUPP )
  {
  gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring "
   "in place\n", d->domain_id, gfn_l);
diff --git a/xen/common/monitor.c b/xen/common/monitor.c
index cb5f37fdb2..d5c9ff1cbf 100644
--- a/xen/common/monitor.c
+++ b/xen/common/monitor.c
@@ -98,7 +98,7 @@ int monitor_traps(struct vcpu *v, bool sync, 
vm_event_request_t *req)
  {
  case 0:
  break;
-case -ENOSYS:
+case -EOPNOTSUPP:
  /*
   * If there was no ring to handle the event, then
   * simply continue executing normally.
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 6e68be47bc..7b4ebced16 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -513,7 +513,7 @@ bool vm_event_check_ring(struct vm_event_domain *ved)
   * this function will always return 0 for a guest.  For a non-guest, we check
   * for space and return -EBUSY if the ring is not available.
   *
- * Return codes: -ENOSYS: the ring is not yet configured
+ * Return codes: -EOPNOTSUPP: the ring is not yet configured
   *   -EBUSY: the ring is busy
   *   0: a spot has been reserved
   *



But vm_event_grab_slot() (which ends up being what vm_event_wait_slot() 
returns), still returns -ENOSYS:


463 static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign)
464 {
465 unsigned int avail_req;
466
467 if ( !ved->ring_page )
468 return -ENOSYS;

Although we can't get to that part if vm_event_check_ring() returns 
false, we should probably return -EOPNOTSUPP from there as well. In 
fact, I wonder if any of the -ENOSYS returns in vm_event.c should not be 
replaced with return -EOPNOTSUPPs.


Anyway, the change does clearly improve the code even without settling 
the above questions, so:


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/vm_event: fix rc check for uninitialized ring

2019-05-24 Thread Razvan Cojocaru
On 5/24/19 6:23 PM, Juergen Gross wrote:
> vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring
> since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event
> lists on domain creation"), but the callers test for -ENOSYS.
> 
> Correct the callers.
> 
> Signed-off-by: Juergen Gross 
> ---
> V2: add blank line (Jan Beulich)
> replace two further ENOSYS returns with EOPNOTSUPP (Razvan Cojocaru)

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/5] xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

This parameter isn't used at all.  Futhermore, elide the copyback in
failing cases, as it is only successful paths which generate data which
needs sending back to the caller.

Finally, drop a redundant d == NULL check, as that logic is all common
at the begining of do_domctl().

No functional change.

Signed-off-by: Andrew Cooper 


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/5] xen/vm-event: Expand vm_event_* spinlock macros and rename the lock

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

These serve no purpose, but to add to the congnitive load of following
the code.  Remove the level of indirection.

Furthermore, the lock protects all data in vm_event_domain, making
ring_lock a poor choice of name.

For vm_event_get_response() and vm_event_grab_slot(), fold the exit
paths to have a single unlock, as the compiler can't make this
optimisation itself.

No functional change.

Signed-off-by: Andrew Cooper 


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] xen/vm-event: Misc fixups

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

  * Drop redundant brackes, and inline qualifiers.
  * Insert newlines and spaces where appropriate.
  * Drop redundant NDEBUG - gdprint() is already conditional.  Fix the
logging level, as gdprintk() already prefixes the guest marker.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Razvan Cojocaru 
CC: Tamas K Lengyel 
CC: Petre Pircalabu 
---
  xen/common/vm_event.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 72f42b4..e872680 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -102,6 +102,7 @@ static int vm_event_enable(
  static unsigned int vm_event_ring_available(struct vm_event_domain *ved)
  {
  int avail_req = RING_FREE_REQUESTS(&ved->front_ring);
+
  avail_req -= ved->target_producers;
  avail_req -= ved->foreign_producers;
  
@@ -168,7 +169,7 @@ static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved)

   */
  void vm_event_wake(struct domain *d, struct vm_event_domain *ved)
  {
-if (!list_empty(&ved->wq.list))
+if ( !list_empty(&ved->wq.list) )
  vm_event_wake_queued(d, ved);
  else
  vm_event_wake_blocked(d, ved);
@@ -216,8 +217,8 @@ static int vm_event_disable(struct domain *d, struct 
vm_event_domain **p_ved)
  return 0;
  }
  
-static inline void vm_event_release_slot(struct domain *d,

- struct vm_event_domain *ved)
+static void vm_event_release_slot(struct domain *d,
+  struct vm_event_domain *ved)


But inline is still asking the compiler to try and generate code that 
doesn't end up CALLing an actual function, so is it really redundant 
here? I do realize that for most cases the compiler will have its way 
with this code anyway - especially since the function is static - but 
"static" is not guaranteed to also mean "inline", is it?


In any case,
Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen/vm-event: Remove unnecessary vm_event_domain indirection

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

The use of (*ved)-> leads to poor code generation, as the compiler can't
assume the pointer hasn't changed, and results in hard-to-follow code.

For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and
work primarily with a local ved pointer.

This has a key advantage in vm_event_enable(), in that the partially
constructed vm_event_domain only becomes globally visible once it is
fully constructed.  As a consequence, the spinlock doesn't need holding.

Furthermore, rearrange the order of operations to be more sensible.
Check for repeated enables and an bad HVM_PARAM before allocating
memory, and gather the trivial setup into one place, dropping the
redundant zeroing.

No practical change that callers will notice.

Signed-off-by: Andrew Cooper


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] xen/vm-event: Fix interactions with the vcpu list

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

vm_event_resume() should use domain_vcpu(), rather than opencoding it
without its Spectre v1 safety.

vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is
NULL, so drop the outer if() and reindent, fixing up style issues.

The comment, which is left alone, is false.  This algorithm still has
starvation issues when there is an asymetric rate of generated events.

However, the existing logic is sufficiently complicated and fragile that
I don't think I've followed it fully, and because we're trying to
obsolete this interface, the safest course of action is to leave it
alone, rather than to end up making things subtly different.

Therefore, no practical change that callers would notice.

Signed-off-by: Andrew Cooper


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 4/4] x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled

2019-06-04 Thread Razvan Cojocaru

On 6/3/19 7:38 PM, Tamas K Lengyel wrote:

On Mon, Jun 3, 2019 at 2:26 AM Jan Beulich  wrote:



On 16.05.19 at 23:37,  wrote:

Disable it by default as it is only an experimental subsystem.

Signed-off-by: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: George Dunlap 

v4: add ASSERT_UNREACHABLE to inlined functions where applicable & other
fixups
---
  xen/arch/x86/Kconfig  |  6 +-
  xen/arch/x86/domain.c |  2 ++
  xen/arch/x86/domctl.c |  2 ++
  xen/arch/x86/mm/Makefile  |  2 +-
  xen/arch/x86/x86_64/compat/mm.c   |  2 ++
  xen/arch/x86/x86_64/mm.c  |  2 ++
  xen/common/Kconfig|  3 ---
  xen/common/domain.c   |  2 +-
  xen/common/grant_table.c  |  2 +-
  xen/common/memory.c   |  2 +-
  xen/common/vm_event.c |  6 +++---
  xen/include/asm-x86/mem_sharing.h | 28 
  xen/include/asm-x86/mm.h  |  3 +++
  xen/include/xen/sched.h   |  2 +-
  xen/include/xsm/dummy.h   |  2 +-
  xen/include/xsm/xsm.h |  4 ++--
  xen/xsm/dummy.c   |  2 +-
  xen/xsm/flask/hooks.c |  4 ++--
  18 files changed, 58 insertions(+), 18 deletions(-)


Daniel, it looks like you weren't Cc-ed here, but your ack is needed.


Indeed, I've also seem to have missed CC-ing Razvan (fixed now).


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership

2019-06-12 Thread Razvan Cojocaru
Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fbdc2b..d60e3a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -405,7 +405,8 @@ L:  xen-devel@lists.xenproject.org
 F: unmodified_drivers/linux-2.6/
 
 VM EVENT, MEM ACCESS and MONITOR
-M: Razvan Cojocaru 
+M: Alexandru Isaila 
+M: Petre Pircalabu 
 M: Tamas K Lengyel 
 S: Supported
 F: tools/tests/xen-access
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership

2019-06-12 Thread Razvan Cojocaru

On 6/12/19 7:02 PM, Razvan Cojocaru wrote:

Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 
---
  MAINTAINERS | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fbdc2b..d60e3a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -405,7 +405,8 @@ L:  xen-devel@lists.xenproject.org
  F:unmodified_drivers/linux-2.6/
  
  VM EVENT, MEM ACCESS and MONITOR

-M: Razvan Cojocaru 
+M: Alexandru Isaila 
+M: Petre Pircalabu 
  M:Tamas K Lengyel 
  S:Supported
  F:tools/tests/xen-access



I'm not sure why get-maintainers.pl did not add Tamas' email address 
(added now).


I'll still be in Bitdefender, subscribed to xen-devel and following the 
project, but I'll be quite a bit more involved in other projects and 
that makes being a maintainer difficult.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/altp2m: add altp2m_vcpu_disable_notify

2019-01-02 Thread Razvan Cojocaru
On 1/2/19 1:50 PM, Wei Liu wrote:
> On Tue, Dec 18, 2018 at 05:11:44PM +0200, Razvan Cojocaru wrote:
>> Allow altp2m users to disable #VE/VMFUNC alone. Currently it is
>> only possible to disable this functionality when we disable altp2m
>> completely; #VE/VMFUNC can only be enabled once per altp2m session.
>>
>> In addition to making things complete, disabling #VE is also a
>> workaround for CFW116 ("When Virtualization Exceptions are Enabled,
>> EPT Violations May Generate Erroneous Virtualization Exceptions")
>> on Xeon E-2100 CPUs.
>>
>> Signed-off-by: Razvan Cojocaru 
>>
>> ---
>> Changes since V2:
>>  - Fixed compilation by completing the removal of all references
>>to "pad".
>>
>> Changes since V1:
>>  - Updated the patch description to specify E-2100.
>>  - Made trying to disable #VE when it's already disabled a no-op.
>>  - Removed leftover uint32_t pad; from struct
>>xen_hvm_altp2m_vcpu_disable_notify.
>> ---
>>  tools/libxc/include/xenctrl.h   |  2 ++
>>  tools/libxc/xc_altp2m.c | 22 ++
>>  xen/arch/x86/hvm/hvm.c  | 28 
>>  xen/include/public/hvm/hvm_op.h | 11 ++-
>>  4 files changed, 62 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 97ae965..31cdda7 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -1932,6 +1932,8 @@ int xc_altp2m_get_domain_state(xc_interface *handle, 
>> uint32_t dom, bool *state);
>>  int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t dom, bool 
>> state);
>>  int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, uint32_t domid,
>>   uint32_t vcpuid, xen_pfn_t gfn);
>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid,
>> +  uint32_t vcpuid);
>>  int xc_altp2m_create_view(xc_interface *handle, uint32_t domid,
>>xenmem_access_t default_access, uint16_t 
>> *view_id);
>>  int xc_altp2m_destroy_view(xc_interface *handle, uint32_t domid,
>> diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c
>> index 844b9f1..f8cd603 100644
>> --- a/tools/libxc/xc_altp2m.c
>> +++ b/tools/libxc/xc_altp2m.c
>> @@ -91,6 +91,28 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface 
>> *handle, uint32_t domid,
>>  return rc;
>>  }
>>  
>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid,
>> +  uint32_t vcpuid)
>> +{
>> +int rc;
>> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg);
>> +
>> +arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg));
>> +if ( arg == NULL )
>> +return -1;
>> +
>> +arg->version = HVMOP_ALTP2M_INTERFACE_VERSION;
>> +arg->cmd = HVMOP_altp2m_vcpu_disable_notify;
>> +arg->domain = domid;
>> +arg->u.disable_notify.vcpu_id = vcpuid;
>> +
>> +rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
>> +  HYPERCALL_BUFFER_AS_ARG(arg));
> 
> Tabs here.

Right, that was copy/pasted from xc_altp2m_set_vcpu_enable_notify() - it
turns out that most function in that source file have the tab problem.
I'll fix them all while at it.

> With this fixed:
> 
> Acked-by: Wei Liu 

Thanks!


Happy new year,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/altp2m: add altp2m_vcpu_disable_notify

2019-01-02 Thread Razvan Cojocaru
On 1/2/19 2:29 PM, Andrew Cooper wrote:
> On 02/01/2019 11:50, Wei Liu wrote:
>> On Tue, Dec 18, 2018 at 05:11:44PM +0200, Razvan Cojocaru wrote:
>>> Allow altp2m users to disable #VE/VMFUNC alone. Currently it is
>>> only possible to disable this functionality when we disable altp2m
>>> completely; #VE/VMFUNC can only be enabled once per altp2m session.
>>>
>>> In addition to making things complete, disabling #VE is also a
>>> workaround for CFW116 ("When Virtualization Exceptions are Enabled,
>>> EPT Violations May Generate Erroneous Virtualization Exceptions")
>>> on Xeon E-2100 CPUs.
>>>
>>> Signed-off-by: Razvan Cojocaru 
>>>
>>> ---
>>> Changes since V2:
>>>  - Fixed compilation by completing the removal of all references
>>>to "pad".
>>>
>>> Changes since V1:
>>>  - Updated the patch description to specify E-2100.
>>>  - Made trying to disable #VE when it's already disabled a no-op.
>>>  - Removed leftover uint32_t pad; from struct
>>>xen_hvm_altp2m_vcpu_disable_notify.
>>> ---
>>>  tools/libxc/include/xenctrl.h   |  2 ++
>>>  tools/libxc/xc_altp2m.c | 22 ++
>>>  xen/arch/x86/hvm/hvm.c  | 28 
>>>  xen/include/public/hvm/hvm_op.h | 11 ++-
>>>  4 files changed, 62 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>>> index 97ae965..31cdda7 100644
>>> --- a/tools/libxc/include/xenctrl.h
>>> +++ b/tools/libxc/include/xenctrl.h
>>> @@ -1932,6 +1932,8 @@ int xc_altp2m_get_domain_state(xc_interface *handle, 
>>> uint32_t dom, bool *state);
>>>  int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t dom, bool 
>>> state);
>>>  int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, uint32_t domid,
>>>   uint32_t vcpuid, xen_pfn_t gfn);
>>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid,
>>> +  uint32_t vcpuid);
>>>  int xc_altp2m_create_view(xc_interface *handle, uint32_t domid,
>>>xenmem_access_t default_access, uint16_t 
>>> *view_id);
>>>  int xc_altp2m_destroy_view(xc_interface *handle, uint32_t domid,
>>> diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c
>>> index 844b9f1..f8cd603 100644
>>> --- a/tools/libxc/xc_altp2m.c
>>> +++ b/tools/libxc/xc_altp2m.c
>>> @@ -91,6 +91,28 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface 
>>> *handle, uint32_t domid,
>>>  return rc;
>>>  }
>>>  
>>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid,
>>> +  uint32_t vcpuid)
>>> +{
>>> +int rc;
>>> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg);
>>> +
>>> +arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg));
>>> +if ( arg == NULL )
>>> +return -1;
>>> +
>>> +arg->version = HVMOP_ALTP2M_INTERFACE_VERSION;
>>> +arg->cmd = HVMOP_altp2m_vcpu_disable_notify;
>>> +arg->domain = domid;
>>> +arg->u.disable_notify.vcpu_id = vcpuid;
>>> +
>>> +rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
>>> + HYPERCALL_BUFFER_AS_ARG(arg));
>> Tabs here.
>>
>> With this fixed:
>>
>> Acked-by: Wei Liu 
> 
> Fixed and committed.

Oh, thanks! Nevermind my previous reply then. :)


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] libxc/altp2m: clean up TABs

2019-01-02 Thread Razvan Cojocaru
Replace all the TABs with spaces.

Signed-off-by: Razvan Cojocaru 
---
 tools/libxc/xc_altp2m.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c
index e61bacf..a86520c 100644
--- a/tools/libxc/xc_altp2m.c
+++ b/tools/libxc/xc_altp2m.c
@@ -38,7 +38,7 @@ int xc_altp2m_get_domain_state(xc_interface *handle, uint32_t 
dom, bool *state)
 arg->domain = dom;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 if ( !rc )
 *state = arg->u.domain_state.state;
@@ -62,7 +62,7 @@ int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t 
dom, bool state)
 arg->u.domain_state.state = state;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
@@ -85,7 +85,7 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, 
uint32_t domid,
 arg->u.enable_notify.gfn = gfn;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
@@ -130,7 +130,7 @@ int xc_altp2m_create_view(xc_interface *handle, uint32_t 
domid,
 arg->u.view.hvmmem_default_access = default_access;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 if ( !rc )
 *view_id = arg->u.view.view;
@@ -155,7 +155,7 @@ int xc_altp2m_destroy_view(xc_interface *handle, uint32_t 
domid,
 arg->u.view.view = view_id;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
@@ -178,7 +178,7 @@ int xc_altp2m_switch_to_view(xc_interface *handle, uint32_t 
domid,
 arg->u.view.view = view_id;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
@@ -253,7 +253,7 @@ int xc_altp2m_set_mem_access(xc_interface *handle, uint32_t 
domid,
 arg->u.mem_access.gfn = gfn;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
@@ -278,7 +278,7 @@ int xc_altp2m_change_gfn(xc_interface *handle, uint32_t 
domid,
 arg->u.change_gfn.new_gfn = new_gfn;
 
 rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m,
- HYPERCALL_BUFFER_AS_ARG(arg));
+  HYPERCALL_BUFFER_AS_ARG(arg));
 
 xc_hypercall_buffer_free(handle, arg);
 return rc;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 131688: regressions - FAIL

2019-01-03 Thread Razvan Cojocaru
On 1/3/19 8:50 PM, osstest service owner wrote:
> flight 131688 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/131688/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
> REGR. vs. 131670
> 
> Tests which did not succeed, but are not blocking:
>  test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 
> 131670
>  test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 
> 131670
>  test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 
> 131670
>  test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 
> 131670
>  test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 
> 131670
>  test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 
> 131670
>  test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 
> 131670
>  test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 
> 131670
>  test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 
> 131670
>  test-amd64-i386-xl-pvshim12 guest-start  fail   never 
> pass
>  test-amd64-i386-libvirt  13 migrate-support-checkfail   never 
> pass
>  test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never 
> pass
>  test-amd64-amd64-libvirt 13 migrate-support-checkfail   never 
> pass
>  test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never 
> pass
>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
> fail never pass
>  test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
> fail never pass
>  test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never 
> pass
>  test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never 
> pass
>  test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never 
> pass
>  test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never 
> pass
>  test-armhf-armhf-xl  13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl  14 saverestore-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never 
> pass
>  test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never 
> pass
>  test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never 
> pass
>  test-armhf-armhf-libvirt 13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never 
> pass
>  test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never 
> pass
>  test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never 
> pass
>  test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never 
> pass
>  test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never 
> pass
>  test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never 
> pass
>  test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never 
> pass
> 
> version targeted for testing:
>  xen  9b97818c3d5854caa95d8af20fe7ef6112ff1b06
> baseline version:
>  xen  7b6e05c50fc39466fcc685fb6d4216f99af58743
> 
> Last test of basis   131670  2019-01-01 08:51:55 Z2 days
> Testing same since   131688  2019-01-02 14:52:32 Z1 days1 attempts
> 
> 
> People who touched revisions under test:
>   Razvan Cojocaru 
>   Wei Liu 
> 
> jobs:
>  build-amd64-xsm  pass
>  build-i386-xsm   pass
>  build-amd64-xtf  pass
>  build-amd64   

Re: [Xen-devel] Xen (both x86 and Arm) Community Call: Jan 9 - 16:00 - 17:00 UTC - Call for Agenda Items

2019-01-07 Thread Razvan Cojocaru

On 1/7/19 4:26 PM, Tamas K Lengyel wrote:

Fix VGA logdirty with altp2m (v11)   Razvan Cojocaru?


This has been merged already.


Indeed, sorry for only seeing this now - I've had a small technical 
SNAFU I had to take care of today and missed some emails.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version

2019-01-08 Thread Razvan Cojocaru

On 1/8/19 6:27 PM, Jan Beulich wrote:

On 19.12.18 at 19:52,  wrote:

Signed-off-by: Petre Pircalabu 


An empty description is not helpful. The immediate question is: Why?
We don't do this for other interface versions. I'm unconvinced a
special purpose piece of information like this one belongs into the
rather generic version hypercall.


For an introspection application meant to be deployed on several Xen 
versions without recompiling, it is important to be able to decide at 
runtime what size and layout the vm_event struct has.


Currently this can somewhat be done by associating the current version 
with the vm_event version, but that is not ideal for obvious reasons. 
Reading the vm_event version from an actual vm_event is also out of the 
question, because in order to be able to receive the first vm_event we 
have to set the ring buffer up, and that requires knowledge of the size 
of the vm_event. So a run-time mechanism for querying the vm_event 
version is needed.


We just thought that this was the most flexible place to add it.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version

2019-01-09 Thread Razvan Cojocaru



On 1/8/19 6:47 PM, Jan Beulich wrote:

On 08.01.19 at 17:37,  wrote:

On 1/8/19 6:27 PM, Jan Beulich wrote:

On 19.12.18 at 19:52,  wrote:

Signed-off-by: Petre Pircalabu 


An empty description is not helpful. The immediate question is: Why?
We don't do this for other interface versions. I'm unconvinced a
special purpose piece of information like this one belongs into the
rather generic version hypercall.


For an introspection application meant to be deployed on several Xen
versions without recompiling, it is important to be able to decide at
runtime what size and layout the vm_event struct has.

Currently this can somewhat be done by associating the current version
with the vm_event version, but that is not ideal for obvious reasons.
Reading the vm_event version from an actual vm_event is also out of the
question, because in order to be able to receive the first vm_event we
have to set the ring buffer up, and that requires knowledge of the size
of the vm_event. So a run-time mechanism for querying the vm_event
version is needed.

We just thought that this was the most flexible place to add it.


How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION?


That would work as well, we just thought this was the least intrusive 
and most extensible way to do it (other queries could be added similarly 
in the future, without needing a new DOMCTL / libxc toolstack 
modifications).



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12] x86/p2m: fix p2m_finish_type_change()

2019-01-09 Thread Razvan Cojocaru
finish_type_change() returns a negative int on error, but the
current code checks if ( !rc ). Also properly indent the out:
label while at it.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/mm/p2m.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 5451f16..e08f6b1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1176,7 +1176,7 @@ int p2m_finish_type_change(struct domain *d,
 
 rc = finish_type_change(hostp2m, first_gfn, max_nr);
 
-if ( !rc )
+if ( rc < 0 )
 goto out;
 
 #ifdef CONFIG_HVM
@@ -1193,13 +1193,13 @@ int p2m_finish_type_change(struct domain *d,
 rc = finish_type_change(altp2m, first_gfn, max_nr);
 p2m_unlock(altp2m);
 
-if ( !rc )
+if ( rc < 0 )
 goto out;
 }
 }
 #endif
 
-out:
+ out:
 p2m_unlock(hostp2m);
 
 return rc;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V2] x86/p2m: fix p2m_finish_type_change()

2019-01-09 Thread Razvan Cojocaru
finish_type_change() returns a negative int on error, but the
current code checks if ( !rc ). We also need to treat
finish_type_change()'s return codes cumulatively in the
success case (don't overwrite a 1 returned while processing
the hostp2m if processing an altp2m returns 0).

The breakage was introduced by commit 0fb4b58c8b
("x86/altp2m: fix display frozen when switching to a new view
early").

Properly indent the out: label while at it.

Signed-off-by: Razvan Cojocaru 

---
Changes since V1:
 - Updated description.
 - Now treating finish_type_change()'s return value cumulatively
   for the success case.
---
 xen/arch/x86/mm/p2m.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 5451f16..91f412f 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1176,7 +1176,7 @@ int p2m_finish_type_change(struct domain *d,
 
 rc = finish_type_change(hostp2m, first_gfn, max_nr);
 
-if ( !rc )
+if ( rc < 0 )
 goto out;
 
 #ifdef CONFIG_HVM
@@ -1188,18 +1188,24 @@ int p2m_finish_type_change(struct domain *d,
 if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) )
 {
 struct p2m_domain *altp2m = d->arch.altp2m_p2m[i];
+int rc1;
 
 p2m_lock(altp2m);
-rc = finish_type_change(altp2m, first_gfn, max_nr);
+rc1 = finish_type_change(altp2m, first_gfn, max_nr);
 p2m_unlock(altp2m);
 
-if ( !rc )
+if ( rc1 < 0 )
+{
+rc = rc1;
 goto out;
+}
+
+rc = max(rc, rc1);
 }
 }
 #endif
 
-out:
+ out:
 p2m_unlock(hostp2m);
 
 return rc;
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.

2019-01-09 Thread Razvan Cojocaru

On 12/20/18 4:28 PM, Paul Durrant wrote:

-Original Message-
From: Petre Ovidiu PIRCALABU [mailto:ppircal...@bitdefender.com]
Sent: 20 December 2018 14:26
To: Paul Durrant ; xen-devel@lists.xenproject.org
Cc: Stefano Stabellini ; Wei Liu
; Razvan Cojocaru ; Konrad
Rzeszutek Wilk ; George Dunlap
; Andrew Cooper ; Ian
Jackson ; Tim (Xen.org) ; Julien
Grall ; Tamas K Lengyel ; Jan
Beulich ; Roger Pau Monne 
Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels
for sync requests.

On Thu, 2018-12-20 at 12:05 +, Paul Durrant wrote:

The memory for the asynchronous ring and the synchronous channels
will
be allocated from domheap and mapped to the controlling domain
using the
foreignmemory_map_resource interface. Unlike the current
implementation,
the allocated pages are not part of the target DomU, so they will
not be
reclaimed when the vm_event domain is disabled.


Why re-invent the wheel here? The ioreq infrastructure already does
pretty much everything you need AFAICT.

   Paul


I wanted preseve as much as possible from the existing vm_event DOMCTL
interface and add only the necessary code to allocate and map the
vm_event_pages.


That means we have two subsystems duplicating a lot of functionality though. It 
would be much better to use ioreq server if possible than provide a 
compatibility interface via DOMCTL.


Just to clarify the compatibility issue: there's a third element between 
Xen and the introspection application, the Linux kernel which needs to 
be fairly recent for the whole ioreq machinery to work. The quemu code 
also seems to fallback to the old way of working if that's the case.


This means that there's a choice to be made here: either we keep 
backwards compatibility with the old vm_event interface (in which case 
we can't drop the waitqueue code), or we switch to the new one and leave 
older setups in the dust (but there's less code duplication and we can 
get rid of the waitqueue code).


In any event, it's not very clear (to me, at least) how the envisioned 
ioreq replacement should work. I assume we're meant to use the whole 
infrastructure (as opposed to what we're now doing, which is to only use 
the map-hypervisor-memory part), i.e. both mapping and signaling. Could 
we discuss this in more detail? Are there any docs on this or ioreq 
minimal clients (like xen-access.c is for vm_event) we might use?



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:15 PM, Tamas K Lengyel wrote:

On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné  wrote:


Adding the introspection guys.

On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote:

On 04.01.19 at 16:35,  wrote:

On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote:

On 04.01.19 at 09:57,  wrote:

On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote:

On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné  wrote:


On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote:

On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné 

wrote:


On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote:

Then I wonder why you need such check in any case if the code can
handle such cases, the more than the check itself is racy.


OK, so at the root of the question here is: does it matter what the p2m
type of the memory is at these points:

1) when the gfn is translated to mfn, at the time of ring registration


This is the important check, because that's where you should take a
reference to the page. In this case you should check that the page is
of ram_rw type.


2) when the hypervisor writes into guest memory:
 - where the tx_ptr index is initialized in the register op
 - where ringbuf data is written in sendv
 - where ring description data is written in notify


As long as you keep a reference to the pages that are part of the ring
you don't need to do any checks when writing/reading from them. If the
guest messes up it's p2m and does change the gfn -> mfn mappings for
pages that are part of the ring that's the guest problem, the
hypervisor still has a reference to those pages so they won't be
reused.


For use cases like introspection this may not be fully correct,
but it may also be that my understanding there isn't fully
correct. If introspection agents care about _any_ writes to
a page, hypervisor ones (which in most cases are merely
writes on behalf of the guest) might matter as well. I think
to decide whether page accesses need to be accompanied
by any checks (and if so, which ones) one needs to
- establish what p2m type transitions are possible for a
   given page,
- verify what restrictions may occur "behind the back" of
   the entity wanting to do the accesses,
- explore whether doing the extra checking at p2m type
   change time wouldn't be better than at the time of access.


Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Indeed, that's how it goes.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:34 PM, Roger Pau Monné wrote:

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?


Would these two guests be HVM guests? Introspection only works for HVM 
guests. I'm not sure I follow your scenario though. How would these 
guests collaborate to escape introspection via grants?



If that's the case, and introspection doesn't care about hypervisor
accesses to guest pages, then just getting a reference to the
underlying page when the ring is setup should be enough. There's no
need to check the gfn -> mfn relation every time there's an hypervisor
access to the ring.


I think so, but I might be missing something.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 13/25] argo: implement the register op

2019-01-09 Thread Razvan Cojocaru

On 1/9/19 6:50 PM, Tamas K Lengyel wrote:

On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru
 wrote:


On 1/9/19 6:34 PM, Roger Pau Monné wrote:

Maybe this is use-case is different, but how does introspection handle
accesses to the shared info page or the runstate info for example?

I would consider argo to be the same in this regard.


Not exactly: The shared info page is special in any event. For
runstate info (and alike - there's also struct vcpu_time_info)
I'd question correctness of the current handling. If that's
wrong already, I'd prefer if the issue wasn't spread.


There are also grants, which when used together with another guest on
the same host could allow to bypass introspection AFAICT? (unless
there's some policy applied that limit grant sharing to trusted
domains)

TBH I'm not sure how to handle hypoervisor accesses with
introspection.  My knowledge of introspection is fairly limited, but
it pauses the guest and sends a notification to an in guest agent. I'm
not sure this is applicable to hypervisor writes, since it's not
possible to pause hypervisor execution and wait for a response from a
guest agent.



Introspection applications only care about memory accesses performed
by the guest. Hypervisor accesses to monitored pages are not included
when monitoring - it is actually a feature when using the emulator in
Xen to continue guest execution because the hypervisor ignores EPT
memory permissions that trip the guest for introspection. So having
the hypervisor access memory or a grant-shared page being accessed in
another domain are not a problem for introspection.


Can't then two guests running on the same host be able to completely
bypass introspection? I guess you prevent this by limiting to which
guests pages can be shared?


Would these two guests be HVM guests? Introspection only works for HVM
guests. I'm not sure I follow your scenario though. How would these
guests collaborate to escape introspection via grants?


If there are two domains acting maliciously in concert to bypass
monitoring of memory writes they could achieve that with grants, yes.
Say a write-monitored page is grant-shared to another domain, which
then does the write on behalf of the first. I wouldn't say that's
"completely bypassing introspection" though, there are many types of
events that can be monitored, write-accesses are only one. I'm not
aware of any mechanism that can be used to limit which pages can be
shared but you can use XSM to restrict which domains can share pages
to begin with. That's normally enough.


Right, I agree. We're not currently dealing with that case and assume 
that XSM (or a similar mechanism) will be used in scenarios where this 
level of access is possible.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.

2019-01-10 Thread Razvan Cojocaru

On 1/10/19 11:58 AM, Paul Durrant wrote:

-Original Message-




Why re-invent the wheel here? The ioreq infrastructure already does
pretty much everything you need AFAICT.

Paul


I wanted preseve as much as possible from the existing vm_event DOMCTL
interface and add only the necessary code to allocate and map the
vm_event_pages.


That means we have two subsystems duplicating a lot of functionality

though. It would be much better to use ioreq server if possible than
provide a compatibility interface via DOMCTL.

Just to clarify the compatibility issue: there's a third element between
Xen and the introspection application, the Linux kernel which needs to
be fairly recent for the whole ioreq machinery to work. The qemu code
also seems to fallback to the old way of working if that's the case.



Tht'a corrent. For IOREQ server there is a fall-back mechanism when privcmd 
doesn't support resource mapping.


This means that there's a choice to be made here: either we keep
backwards compatibility with the old vm_event interface (in which case
we can't drop the waitqueue code), or we switch to the new one and leave
older setups in the dust (but there's less code duplication and we can
get rid of the waitqueue code).



I don't know what your compatibility model is. QEMU needs to maintain 
compatibility across various different versions of Xen and Linux so there are 
many shims and much compat code. You may not need this.


Our current model is: deploy a special guest (that we call a SVA - short 
for security virtual appliance), with its own kernel and applications, 
that for all intents and purposes will act dom0-like.


So in that scenario, we control the guest kernel so backwards 
compatibility for the case where the kernel does not support the proper 
ioctl is not a priority. That said, it might very well be an issue for 
someone, and we'd like to be well-behaved citizens and not inconvenience 
other vm_event consumers. Tamas, is this something you'd be concerned about?


What we do care about is being able to fallback in the case where the 
host hypervisor does not know anything about the new ioreq 
infrastructure. IOW, nobody can stop a client from running a Xen 
4.7-based XenServer on top of which our introspection guest will not be 
able to use the new ioreq code even if it's using the latest kernel. But 
that can be done at application level and would not require 
hypervisor-level backwards compatibility support (whereas in the first 
case - an old kernel - it would).


On top of all of this there's Andrew's concern of being able to get rid 
of the current vm_event waitqueue code that's making migration brittle.


So, if I understand the situation correctly, we need to negotiate the 
following:


1. Should we try to switch to the ioreq infrastructure for vm_event or 
use our custom one? If I'm remembering things correctly, Paul and Jan 
are for it, Andrew is somewhat against it, Tamas has not expressed a 
preference.


2. However we approach the new code, should we or should we not also 
provide a backwards compatibility layer in the hypervisor? We don't need 
it, but somebody might and it's probably not a good idea to design based 
entirely on the needs of one use-case. Tamas may have different needs 
here, and maybe other members of the xen-devel community as well. Andrew 
prefers that we don't since that removes the waitqueue code.


To reiterate how this got started: we want to move the ring buffer 
memory from the guest to the hypervisor (we've had cases of OSes 
reclaiming that page after the first introspection application exit), 
and we want to make that memory bigger (so that more events will fit 
into it, carrying more information (bigger events)). That's essentially 
all we're after.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-11 Thread Razvan Cojocaru

On 12/25/18 4:29 AM, Tian, Kevin wrote:

From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
Sent: Friday, December 14, 2018 7:50 PM

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru 



Reviewed-by: Kevin Tian 


Ping?


Thanks,
Razvan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 2/2] x86/emulate: Send vm_event from emulate

2019-01-11 Thread Razvan Cojocaru

On 1/11/19 6:38 PM, Tamas K Lengyel wrote:

On Fri, Jan 11, 2019 at 8:37 AM Alexandru Stefan ISAILA
 wrote:


This patch aims to have mem access vm events sent from the emulator.
This is useful in the case of page-walks that have to emulate
instructions in access denied pages.



I'm a little confused about the scenario you mention here. You mark
pages where the pagetables are non-readable/writable in EPT and you
expect the emulated instruction would also violate access permissions
of the guest pagetable itself?


Hello Tamas,

The scenario is this: the pagetables are read-only. At some point, a 
walk tries to write the accessed bit, or the dirty bit somewhere in that 
read-only memory, causing an EPT fault, so we end up in 
p2m_mem_access_check().


Understandably, we don't care about sending this event out to the 
introspection application (we could if we wanted to, which is why this 
behaviour is configurable, but I think it's safe to say that for most 
introspection use-cases this is something we don't care about, and hence 
a perfect opportunity for optimization).


Now, emulating the current instruction helps, and it works. But, what if 
that instruction would have tried to write to another protected page? 
Emulating it, as things stand now, means that we will lose _that_ event, 
and that's potentially a very important EPT event.


We've tried to attack this problem by only writing the A/D bits and 
almost came to a satisfactory solution but there's still some debate on 
whether it's architecturally correct or not - that approach needs more 
studying.


The alternative we've come up with is to instead, at least for the time 
being, attempt to send out vm_events from the emulator code only in this 
case: where we want to emulate the page walk without consulting the EPT, 
but want to consult it when actually emulating the current instruction.


I hope that made sense.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 2/2] x86/emulate: Send vm_event from emulate

2019-01-11 Thread Razvan Cojocaru
On 1/11/19 10:51 PM, Tamas K Lengyel wrote:
> On Fri, Jan 11, 2019 at 9:51 AM Razvan Cojocaru
>  wrote:
>>
>> On 1/11/19 6:38 PM, Tamas K Lengyel wrote:
>>> On Fri, Jan 11, 2019 at 8:37 AM Alexandru Stefan ISAILA
>>>  wrote:
>>>>
>>>> This patch aims to have mem access vm events sent from the emulator.
>>>> This is useful in the case of page-walks that have to emulate
>>>> instructions in access denied pages.
>>>>
>>>
>>> I'm a little confused about the scenario you mention here. You mark
>>> pages where the pagetables are non-readable/writable in EPT and you
>>> expect the emulated instruction would also violate access permissions
>>> of the guest pagetable itself?
>>
>> Hello Tamas,
>>
>> The scenario is this: the pagetables are read-only. At some point, a
>> walk tries to write the accessed bit, or the dirty bit somewhere in that
>> read-only memory, causing an EPT fault, so we end up in
>> p2m_mem_access_check().
>>
>> Understandably, we don't care about sending this event out to the
>> introspection application (we could if we wanted to, which is why this
>> behaviour is configurable, but I think it's safe to say that for most
>> introspection use-cases this is something we don't care about, and hence
>> a perfect opportunity for optimization).
>>
>> Now, emulating the current instruction helps, and it works. But, what if
>> that instruction would have tried to write to another protected page?
>> Emulating it, as things stand now, means that we will lose _that_ event,
>> and that's potentially a very important EPT event.
>>
>> We've tried to attack this problem by only writing the A/D bits and
>> almost came to a satisfactory solution but there's still some debate on
>> whether it's architecturally correct or not - that approach needs more
>> studying.
>>
>> The alternative we've come up with is to instead, at least for the time
>> being, attempt to send out vm_events from the emulator code only in this
>> case: where we want to emulate the page walk without consulting the EPT,
>> but want to consult it when actually emulating the current instruction.
>>
>> I hope that made sense.
> 
> I'm still confused :) In the pagetable walking case, didn't the
> instruction you are emulating just trip by writing to a page you want
> to allow it writing to (the A/D bits)? Or are you saying there is an
> unrelated trap happening with an execute-violation but you don't know
> what other write-protected page it would have tripped if it actually
> executed, and by emulating you effectively miss that write event? The
> latter makes sense, it's the pagetable walking case I have a hard time
> putting together.

OK, assume we have instruction I that accesses page X, which access in
turn causes the A/D bits to be set in pagetable page Y (because it
triggers a page walk). We want to allow the write to Y (setting whatever
A/D bits), but if X is write-protected and I tries to write into it, we
want a vm_event for that write.

With the current code, if we emulate I to write to Y, we lose a
potential vm_event for the write to X. This doesn't happen often, but it
does happen.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-14 Thread Razvan Cojocaru

On 1/12/19 12:04 AM, Boris Ostrovsky wrote:

On 12/14/18 6:49 AM, Razvan Cojocaru wrote:

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru 



Reviewed-by: Boris Ostrovsky 


Thanks! So now we have three reviewed-bys, if I'm not mistaken all we 
need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) 
acks (or otherwise).


Could you please take a look?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-14 Thread Razvan Cojocaru

On 1/14/19 11:53 AM, Jan Beulich wrote:

On 14.01.19 at 10:34,  wrote:

On 1/12/19 12:04 AM, Boris Ostrovsky wrote:

On 12/14/18 6:49 AM, Razvan Cojocaru wrote:

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru 



Reviewed-by: Boris Ostrovsky 


Thanks! So now we have three reviewed-bys, if I'm not mistaken all we
need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM)
acks (or otherwise).


And you'd need to talk Jürgen into allowing this in, now that we're
past the freeze point.


(Adding Jürgen to the conversation.)

Right, that would be ideal if possible - the code has absolutely no 
impact on anything unless vm_event is active on the domain, which is 
never the case for the use-cases considered for a Xen release.


So the case I'm making for the patch to go in 4.12 is that:

1. It's perfectly harmless (it affects nothing, except for introspection).

2. It's trivial and thus very easy to see that it's correct.

3. It fixes a major headache for us, and thus it is a great improvement 
from an introspection standpoint (fixes OS crashes / hangs which we'd 
otherwise need to work around in rather painful ways).


4. V3 of the patch has been sent out on Dec 14th - it's just that 
reviewers have had other priorities and it did not gather all acks in time.


However, if it's not possible or desirable to allow this in the next 
best thing is to at least have all the acks necessary for it to go in 
first thing once the freeze is over.


From Julien's reply I understand that the last ack necessary is Tamas'.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-14 Thread Razvan Cojocaru

On 1/14/19 4:42 PM, Juergen Gross wrote:

On 14/01/2019 11:56, Razvan Cojocaru wrote:

On 1/14/19 11:53 AM, Jan Beulich wrote:

On 14.01.19 at 10:34,  wrote:

On 1/12/19 12:04 AM, Boris Ostrovsky wrote:

On 12/14/18 6:49 AM, Razvan Cojocaru wrote:

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru 



Reviewed-by: Boris Ostrovsky 


Thanks! So now we have three reviewed-bys, if I'm not mistaken all we
need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM)
acks (or otherwise).


And you'd need to talk Jürgen into allowing this in, now that we're
past the freeze point.


(Adding Jürgen to the conversation.)

Right, that would be ideal if possible - the code has absolutely no
impact on anything unless vm_event is active on the domain, which is
never the case for the use-cases considered for a Xen release.

So the case I'm making for the patch to go in 4.12 is that:

1. It's perfectly harmless (it affects nothing, except for introspection).

2. It's trivial and thus very easy to see that it's correct.

3. It fixes a major headache for us, and thus it is a great improvement
from an introspection standpoint (fixes OS crashes / hangs which we'd
otherwise need to work around in rather painful ways).

4. V3 of the patch has been sent out on Dec 14th - it's just that
reviewers have had other priorities and it did not gather all acks in time.

However, if it's not possible or desirable to allow this in the next
best thing is to at least have all the acks necessary for it to go in
first thing once the freeze is over.

 From Julien's reply I understand that the last ack necessary is Tamas'.


With that ack just arrived:

Release-acked-by: Juergen Gross 


Thanks! Very much appreciated.


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-15 Thread Razvan Cojocaru
On 1/14/19 4:42 PM, Juergen Gross wrote:
> On 14/01/2019 11:56, Razvan Cojocaru wrote:
>> On 1/14/19 11:53 AM, Jan Beulich wrote:
>>>>>> On 14.01.19 at 10:34,  wrote:
>>>> On 1/12/19 12:04 AM, Boris Ostrovsky wrote:
>>>>> On 12/14/18 6:49 AM, Razvan Cojocaru wrote:
>>>>>> Block interrupts (in vmx_intr_assist()) for the duration of
>>>>>> processing a sync vm_event (similarly to the strategy
>>>>>> currently used for single-stepping). Otherwise, attempting
>>>>>> to emulate an instruction when requested by a vm_event
>>>>>> reply may legitimately need to call e.g.
>>>>>> hvm_inject_page_fault(), which then overwrites the active
>>>>>> interrupt in the VMCS.
>>>>>>
>>>>>> The sync vm_event handling path on x86/VMX is (roughly):
>>>>>> monitor_traps() -> process vm_event -> vmx_intr_assist()
>>>>>> (possibly writing VM_ENTRY_INTR_INFO) ->
>>>>>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
>>>>>> (possibly overwriting the VM_ENTRY_INTR_INFO value).
>>>>>>
>>>>>> This patch may also be helpful for the future removal
>>>>>> of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().
>>>>>>
>>>>>> Signed-off-by: Razvan Cojocaru 
>>>>>
>>>>>
>>>>> Reviewed-by: Boris Ostrovsky 
>>>>
>>>> Thanks! So now we have three reviewed-bys, if I'm not mistaken all we
>>>> need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM)
>>>> acks (or otherwise).
>>>
>>> And you'd need to talk Jürgen into allowing this in, now that we're
>>> past the freeze point.
>>
>> (Adding Jürgen to the conversation.)
>>
>> Right, that would be ideal if possible - the code has absolutely no
>> impact on anything unless vm_event is active on the domain, which is
>> never the case for the use-cases considered for a Xen release.
>>
>> So the case I'm making for the patch to go in 4.12 is that:
>>
>> 1. It's perfectly harmless (it affects nothing, except for introspection).
>>
>> 2. It's trivial and thus very easy to see that it's correct.
>>
>> 3. It fixes a major headache for us, and thus it is a great improvement
>> from an introspection standpoint (fixes OS crashes / hangs which we'd
>> otherwise need to work around in rather painful ways).
>>
>> 4. V3 of the patch has been sent out on Dec 14th - it's just that
>> reviewers have had other priorities and it did not gather all acks in time.
>>
>> However, if it's not possible or desirable to allow this in the next
>> best thing is to at least have all the acks necessary for it to go in
>> first thing once the freeze is over.
>>
>> From Julien's reply I understand that the last ack necessary is Tamas'.
> 
> With that ack just arrived:
> 
> Release-acked-by: Juergen Gross 
AFAICT this is fine to apply to staging now, am I incorrect?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events

2019-01-16 Thread Razvan Cojocaru

On 1/16/19 10:58 AM, Jan Beulich wrote:

On 16.01.19 at 08:10,  wrote:

On 1/14/19 4:42 PM, Juergen Gross wrote:

On 14/01/2019 11:56, Razvan Cojocaru wrote:

On 1/14/19 11:53 AM, Jan Beulich wrote:

On 14.01.19 at 10:34,  wrote:

On 1/12/19 12:04 AM, Boris Ostrovsky wrote:

On 12/14/18 6:49 AM, Razvan Cojocaru wrote:

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru 



Reviewed-by: Boris Ostrovsky 


Thanks! So now we have three reviewed-bys, if I'm not mistaken all we
need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM)
acks (or otherwise).


And you'd need to talk Jürgen into allowing this in, now that we're
past the freeze point.


(Adding Jürgen to the conversation.)

Right, that would be ideal if possible - the code has absolutely no
impact on anything unless vm_event is active on the domain, which is
never the case for the use-cases considered for a Xen release.

So the case I'm making for the patch to go in 4.12 is that:

1. It's perfectly harmless (it affects nothing, except for introspection).

2. It's trivial and thus very easy to see that it's correct.

3. It fixes a major headache for us, and thus it is a great improvement
from an introspection standpoint (fixes OS crashes / hangs which we'd
otherwise need to work around in rather painful ways).

4. V3 of the patch has been sent out on Dec 14th - it's just that
reviewers have had other priorities and it did not gather all acks in time.

However, if it's not possible or desirable to allow this in the next
best thing is to at least have all the acks necessary for it to go in
first thing once the freeze is over.

 From Julien's reply I understand that the last ack necessary is Tamas'.


With that ack just arrived:

Release-acked-by: Juergen Gross 

AFAICT this is fine to apply to staging now, am I incorrect?


Yes, but may I ask that you be a little more patient?


Of course, apologies. I was just unsure if further action was required 
on my part.



Thank you,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()

2019-01-24 Thread Razvan Cojocaru
On 1/24/19 8:28 PM, Andrew Cooper wrote:
> Code clearing the "Suppress VE" bit in an EPT entry isn't nececsserily running
> in current context.  In ALTP2M_external mode, it definitely is not, and in PV
> context, vcpu_altp2m(current) acts upon the HVM union.
> 
> Even if we could sensibly resolve the target vCPU, it may legitimately not be
> fully set up at this point, so rejecting the EPT modification would be buggy.
> 
> There is a path in hvm_hap_nested_page_fault() which explicitly emulates #VE
> in the cpu_has_vmx_virt_exceptions case, so the -EOPNOTSUPP part of this
> condition is also wrong.
> 
> Drop the !sve check entirely.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Juergen Gross 
> 
> Discovered while trying to fix the gaping security hole with ballooning out
> the #VE info page.  The risk for 4.12 is very minimal - altp2m is off by
> default, not security supported, and the ability to clearing sve is limited to
> introspection code paths.

Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/svm: Fix handling of ICEBP intercepts

2019-02-01 Thread Razvan Cojocaru

On 2/1/19 4:49 PM, Andrew Cooper wrote:

c/s 9338a37d "x86/svm: implement debug events" added support for introspecting
ICEBP debug exceptions, but didn't account for the fact that
svm_get_insn_len() (previously __get_instruction_length) can fail and may
already raise #GP for the guest.

If svm_get_insn_len() fails, return back to guest context rather than
continuing and mistaking a trap-style VMExit for a fault-style one.

Spotted by Coverity.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 
CC: Brian Woods 
CC: Juergen Gross 
CC: Razvan Cojocaru 
CC: Tamas K Lengyel 

This wants backporting to Xen 4.11
---
  xen/arch/x86/hvm/svm/svm.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 2584b90..e21091c 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2758,6 +2758,9 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
  {
  trap_type = X86_EVENTTYPE_PRI_SW_EXCEPTION;
  inst_len = svm_get_insn_len(v, INSTR_ICEBP);
+
+if ( !instr_len )
+break;
  }
  
  rc = hvm_monitor_debug(regs->rip,




Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys

2019-02-04 Thread Razvan Cojocaru

On 2/4/19 1:55 PM, Paul Durrant wrote:

-Original Message-
From: Alexandru Stefan ISAILA [mailto:aisa...@bitdefender.com]
Sent: 04 February 2019 11:37
To: xen-devel@lists.xenproject.org
Cc: Paul Durrant ; jbeul...@suse.com; Andrew
Cooper ; Wei Liu ; Roger
Pau Monne ; rcojoc...@bitdefender.com;
ta...@tklengyel.com; George Dunlap 
Subject: Re: [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys

Ping, any thoughts on this are appreciated.

Regards,
Alex

On 11.01.2019 17:37, Alexandru Stefan ISAILA wrote:

This is done so hvmemul_linear_to_phys() can be called from
hvmemul_map_linear_addr()



I have no objection to pure code movement for this purpose. I certainly prefer 
it to forward declaration of statics.

Thanks for the reply!

We're also after thoughts on the whole series (replying to this specific 
patch instead of the cover letter was an accident).


We expect the second patch to be more controversial and in some need of 
detailed scrutiny.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys

2019-02-04 Thread Razvan Cojocaru

On 2/4/19 5:14 PM, Roger Pau Monné wrote:

On Mon, Feb 04, 2019 at 02:00:40PM +0200, Razvan Cojocaru wrote:

On 2/4/19 1:55 PM, Paul Durrant wrote:

-Original Message-
From: Alexandru Stefan ISAILA [mailto:aisa...@bitdefender.com]
Sent: 04 February 2019 11:37
To: xen-devel@lists.xenproject.org
Cc: Paul Durrant ; jbeul...@suse.com; Andrew
Cooper ; Wei Liu ; Roger
Pau Monne ; rcojoc...@bitdefender.com;
ta...@tklengyel.com; George Dunlap 
Subject: Re: [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys

Ping, any thoughts on this are appreciated.

Regards,
Alex

On 11.01.2019 17:37, Alexandru Stefan ISAILA wrote:

This is done so hvmemul_linear_to_phys() can be called from
hvmemul_map_linear_addr()



I have no objection to pure code movement for this purpose. I certainly prefer 
it to forward declaration of statics.

Thanks for the reply!

We're also after thoughts on the whole series (replying to this specific
patch instead of the cover letter was an accident).

We expect the second patch to be more controversial and in some need of
detailed scrutiny.


I've taken a look at patch 2 and provided some review comments, sorry
for the delay. Keep in mind we are in the middle of the release
process and most developers and reviewers are busy trying to fix bugs
or reviewing bugfixes.


Very much appreciated!


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 08/10] remove late (on-demand) construction of IOMMU page tables

2019-08-16 Thread Razvan Cojocaru
On 8/16/19 8:19 PM, Paul Durrant wrote:
> Now that there is a per-domain IOMMU enable flag, which should be enabled if
> any device is going to be passed through, stop deferring page table
> construction until the assignment is done. Also don't tear down the tables
> again when the last device is de-assigned; defer that task until domain
> destruction.
> 
> This allows the has_iommu_pt() helper and iommu_status enumeration to be
> removed. Calls to has_iommu_pt() are simply replaced by calls to
> is_iommu_enabled(). Remaining open-code tests of iommu_hap_pt_share can also
> be replaced by calls to iommu_use_hap_pt().
> The arch_iommu_populate_page_table() and iommu_construct() functions become
> redundant, as does the 'strict mode' dom0 page_list mapping code in
> iommu_hwdom_init(), and iommu_teardown() can be made static is its only
> remaining caller, iommu_domain_destroy(), is within the same source
> module.
> 
> All in all, about 220 lines of code are removed.
> 
> NOTE: This patch will cause a small amount of extra resource to be used
>   to accommodate IOMMU page tables that may never be used, since the
>   per-domain IOMMU flag enable flag is currently set to the value
>   of the global iommu_enable flag. A subsequent patch will add an
>   option to the toolstack to allow it to be turned off if there is
>   no intention to assign passthrough hardware to the domain.

This has slipped under my radar, sorry.

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

While the in-guest scenario continues to pose problems, this
patch fixes the "external" case.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/hvm/hvm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..33038dc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4533,6 +4533,9 @@ static int do_altp2m_op(
 break;
 }
 
+if ( d != current->domain )
+domain_pause(d);
+
 ostate = d->arch.altp2m_active;
 d->arch.altp2m_active = !!a.u.domain_state.state;
 
@@ -4551,6 +4554,10 @@ static int do_altp2m_op(
 if ( ostate )
 p2m_flush_altp2m(d);
 }
+
+if ( d != current->domain )
+domain_unpause(d);
+
 break;
 }
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru

On 2/8/19 12:51 PM, Jan Beulich wrote:

On 08.02.19 at 10:56,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

While the in-guest scenario continues to pose problems, this
patch fixes the "external" case.


IOW you're papering over the problem rather than fixing it. Why
does altp2m_active get set to true before actually having set up
everything? Shouldn't it get cleared early, but set late?
Well, yes, that would have been my second attempt: set the "altp2m 
enabled" bool after the init, and before the uninit and no longer 
domain_pause() explicitly; however I thought that was a brittle 
solution, relying on comments / programmer attention to the code 
sequence rather than taking a proper lock.


I'll test that scenario then and return with the results / possibly 
another patch.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru

On 2/8/19 1:13 PM, Razvan Cojocaru wrote:

On 2/8/19 12:51 PM, Jan Beulich wrote:

On 08.02.19 at 10:56,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

While the in-guest scenario continues to pose problems, this
patch fixes the "external" case.


IOW you're papering over the problem rather than fixing it. Why
does altp2m_active get set to true before actually having set up
everything? Shouldn't it get cleared early, but set late?
Well, yes, that would have been my second attempt: set the "altp2m 
enabled" bool after the init, and before the uninit and no longer 
domain_pause() explicitly; however I thought that was a brittle 
solution, relying on comments / programmer attention to the code 
sequence rather than taking a proper lock.


I'll test that scenario then and return with the results / possibly 
another patch.


Actually, your suggestion does not work, because the way the code has 
been designed, altp2m_vcpu_initialise() calls altp2m_vcpu_update_p2m(), 
which does the proper work that's interesting to us here, like this:


  2153 static void vmx_vcpu_update_eptp(struct vcpu *v)
  2154 {
  2155 struct domain *d = v->domain;
  2156 struct p2m_domain *p2m = NULL;
  2157 struct ept_data *ept;
  2158
  2159 if ( altp2m_active(d) )
  2160 p2m = p2m_get_altp2m(v);
  2161 if ( !p2m )
  2162 p2m = p2m_get_hostp2m(d);
  2163
  2164 ept = &p2m->ept;
  2165 ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m));
  2166
  2167 vmx_vmcs_enter(v);
  2168
  2169 __vmwrite(EPT_POINTER, ept->eptp);
  2170
  2171 if ( v->arch.hvm.vmx.secondary_exec_control &
  2172  SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
  2173 __vmwrite(EPTP_INDEX, vcpu_altp2m(v).p2midx);
  2174
  2175 vmx_vmcs_exit(v);
  2176 }

So please note that on line 2159 it checks if altp2m is active, and only 
then does it do the right thing. So setting the d->arch.altp2m_active 
bool _after_ calling altp2m_vcpu_initialise() will fail to work 
correctly - turning this into a chicken-and-egg problem, or perhaps more 
interestingly, another discussion about whether in-guest-only altp2m 
agents make any sense fundamentally.


I think that, sadly, the best we can do at this time is the patch I've sent.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru

On 2/8/19 2:58 PM, Jan Beulich wrote:

Well, to be honest I expected dependencies like this to be there,
and hence I didn't expect it would be a straightforward change.
Just like we do e.g. for the IOMMU enabling, I guess the boolean
wants to become a tristate then (off -> enabling -> enabled),
which interested sites then can use to distinguish what they
want/need to do.

Another relatively obvious solution would be to add a boolean
parameter to altp2m_vcpu_update_p2m() such that
altp2m_vcpu_initialise() can guide it properly. But this of course
depends to a certain degree on how wide spread the problem is.


Fair enough, I'll attempt V2 with the second suggestion (adding a bool 
to altp2m_vcpu_update_p2m()).



Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/hvm/hvm.c| 17 ++---
 xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
 xen/arch/x86/mm/altp2m.c  |  4 ++--
 xen/arch/x86/mm/p2m.c |  4 ++--
 xen/include/asm-x86/altp2m.h  |  2 +-
 xen/include/asm-x86/domain.h  |  2 +-
 xen/include/asm-x86/hvm/hvm.h |  6 +++---
 7 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..4d1d13d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
 case HVMOP_altp2m_set_domain_state:
 {
 struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
 
 if ( nestedhvm_enabled(d) )
 {
@@ -4534,12 +4534,16 @@ static int do_altp2m_op(
 }
 
 ostate = d->arch.altp2m_active;
-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = !!a.u.domain_state.state;
 
 /* If the alternate p2m state has changed, handle appropriately */
-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+if ( ostate )
+d->arch.altp2m_active = false;
+
 for_each_vcpu( d, v )
 {
 if ( !ostate )
@@ -4550,7 +4554,14 @@ static int do_altp2m_op(
 
 if ( ostate )
 p2m_flush_altp2m(d);
+else
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = true;
 }
+
 break;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
 return !!cpu_has_monitor_trap_flag;
 }
 
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
 {
 struct domain *d = v->domain;
 struct p2m_domain *p2m = NULL;
 struct ept_data *ept;
 
-if ( altp2m_active(d) )
+if ( altp2m_enabled )
 p2m = p2m_get_altp2m(v);
 if ( !p2m )
 p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
 vcpu_altp2m(v).p2midx = 0;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, true);
 
 if ( v != current )
 vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
 
 altp2m_vcpu_reset(v);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, false);
 altp2m_vcpu_update_vmfunc_ve(v);
 
 if ( v != current )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..f9e96fc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx)
 atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
 vcpu_altp2m(v).p2midx = idx;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, altp2m_active(d));
 }
 rc = 1;
 }
@@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
unsigned int idx)
 atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
 vcpu_altp2m(v).p2midx = idx;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_u

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru

On 2/8/19 4:00 PM, Razvan Cojocaru wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 
---
  xen/arch/x86/hvm/hvm.c| 17 ++---
  xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
  xen/arch/x86/mm/altp2m.c  |  4 ++--
  xen/arch/x86/mm/p2m.c |  4 ++--
  xen/include/asm-x86/altp2m.h  |  2 +-
  xen/include/asm-x86/domain.h  |  2 +-
  xen/include/asm-x86/hvm/hvm.h |  6 +++---
  7 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..4d1d13d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
  case HVMOP_altp2m_set_domain_state:
  {
  struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
  
  if ( nestedhvm_enabled(d) )

  {
@@ -4534,12 +4534,16 @@ static int do_altp2m_op(
  }
  
  ostate = d->arch.altp2m_active;

-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = !!a.u.domain_state.state;
  
  /* If the alternate p2m state has changed, handle appropriately */

-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
   (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
  {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+if ( ostate )
+d->arch.altp2m_active = false;
+
  for_each_vcpu( d, v )
  {
  if ( !ostate )
@@ -4550,7 +4554,14 @@ static int do_altp2m_op(
  
  if ( ostate )

  p2m_flush_altp2m(d);
+else
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = true;
  }
+
  break;
  }
  
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c

index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
  
-static void vmx_vcpu_update_eptp(struct vcpu *v)

+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
  
-if ( altp2m_active(d) )

+if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
  vcpu_altp2m(v).p2midx = 0;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
  
-altp2m_vcpu_update_p2m(v);

+altp2m_vcpu_update_p2m(v, true);
  
  if ( v != current )

  vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
  
  altp2m_vcpu_reset(v);
  
-altp2m_vcpu_update_p2m(v);

+altp2m_vcpu_update_p2m(v, false);
  altp2m_vcpu_update_vmfunc_ve(v);
  
  if ( v != current )

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..f9e96fc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, altp2m_active(d));
  }
  rc = 1;
  }
@@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
On 2/8/19 5:50 PM, Jan Beulich wrote:
 On 08.02.19 at 15:00,  wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4525,7 +4525,7 @@ static int do_altp2m_op(
>>  case HVMOP_altp2m_set_domain_state:
>>  {
>>  struct vcpu *v;
>> -bool_t ostate;
>> +bool ostate, nstate;
>>  
>>  if ( nestedhvm_enabled(d) )
>>  {
>> @@ -4534,12 +4534,16 @@ static int do_altp2m_op(
>>  }
>>  
>>  ostate = d->arch.altp2m_active;
>> -d->arch.altp2m_active = !!a.u.domain_state.state;
>> +nstate = !!a.u.domain_state.state;
> 
> No need for !! here.

I'll remove it.

>>  /* If the alternate p2m state has changed, handle appropriately */
>> -if ( d->arch.altp2m_active != ostate &&
>> +if ( nstate != ostate &&
>>   (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
>>  {
>> +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
>> +if ( ostate )
>> +d->arch.altp2m_active = false;
> 
> Why the if()? In the opposite case you'd simply write false into
> what already holds false.

The value written into d->arch.altp2m_active is not the point here. The
point is that if ( ostate ), then we are disabling altp2m (because the
if above this one makes sure ostate != nstate).

So in the disable case, first I wanted to set d->arch.altp2m_active to
false (which immediately causes altp2m_active(d) to return false as
well), and then actually perform the work.

>> @@ -4550,7 +4554,14 @@ static int do_altp2m_op(
>>  
>>  if ( ostate )
>>  p2m_flush_altp2m(d);
>> +else
>> +/*
>> + * Wait until altp2m_vcpu_initialise() is done before 
>> marking
>> + * altp2m as being enabled for the domain.
>> + */
>> +d->arch.altp2m_active = true;
> 
> Similarly here you could omit the "else" and simply store "nstate" afaict.

As above, in the enable case I wanted to first setup altp2m on all VCPUs
with altp2m_vcpu_initialise(), and only after that set
d->arch.altp2m_active = true.

In summary, if ( ostate ) we want to set d->arch.altp2m_active before
the for (we're disabling altp2m), and if ( !ostate ) (which is the else
above) we want to set d->arch.altp2m_active after said for.

We can indeed store nstate. I just thought things look clearer with
"true" and "false", but if you prefer there's no problem assigning nstate.

>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
>>  return !!cpu_has_monitor_trap_flag;
>>  }
>>  
>> -static void vmx_vcpu_update_eptp(struct vcpu *v)
>> +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
>>  {
>>  struct domain *d = v->domain;
>>  struct p2m_domain *p2m = NULL;
>>  struct ept_data *ept;
>>  
>> -if ( altp2m_active(d) )
>> +if ( altp2m_enabled )
>>  p2m = p2m_get_altp2m(v);
>>  if ( !p2m )
>>  p2m = p2m_get_hostp2m(d);
> 
> Is this an appropriate transformation? That is, can there not be
> any domains for which altp2m_active() returns false despite
> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't
> really judge whether index would always be INVALID_ALTP2M in
> such a case.)

Yes, it should be completely safe (please see below for details).

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
>> unsigned int idx)
>>  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
>>  vcpu_altp2m(v).p2midx = idx;
>>  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
>> -altp2m_vcpu_update_p2m(v);
>> +altp2m_vcpu_update_p2m(v, altp2m_active(d));
>>  }
>>  rc = 1;
>>  }
>> @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
>> unsigned int idx)
>>  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
>>  vcpu_altp2m(v).p2midx = idx;
>>  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
>> -altp2m_vcpu_update_p2m(v);
>> +altp2m_vcpu_update_p2m(v, altp2m_active(d));
>>  }
> 
> In both cases, isn't altp2m_active() going to return true anyway
> when we get there (related to the question above)?

Yes, it will return true. In order for it to return false,
altp2m_vcpu_destroy() would have had to run on that VCPU, which (among
other things) calls altp2m_vcpu_reset(), which does v->p2midx =
INVALID_ALTP2M.

There's an if() above (not shown in your reply) that says "if ( idx !=
vcpu_altp2m(v).p2midx )", so, indeed, by the time we end up here we can
reasonably assume that altp2m_active(d) will return true.

I've just put in "altp2m_active(d)" to make sure there's absolutely no
functional change here, but AFA

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
On 2/8/19 6:44 PM, Jan Beulich wrote:
 On 08.02.19 at 17:30,  wrote:
>> On 2/8/19 5:50 PM, Jan Beulich wrote:
>> On 08.02.19 at 15:00,  wrote:
  /* If the alternate p2m state has changed, handle appropriately */
 -if ( d->arch.altp2m_active != ostate &&
 +if ( nstate != ostate &&
   (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
  {
 +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). 
 */
 +if ( ostate )
 +d->arch.altp2m_active = false;
>>>
>>> Why the if()? In the opposite case you'd simply write false into
>>> what already holds false.
>>
>> The value written into d->arch.altp2m_active is not the point here. The
>> point is that if ( ostate ), then we are disabling altp2m (because the
>> if above this one makes sure ostate != nstate).
>>
>> So in the disable case, first I wanted to set d->arch.altp2m_active to
>> false (which immediately causes altp2m_active(d) to return false as
>> well), and then actually perform the work.
> 
> Sure - I'm not asking to remove everything above, just the if()
> (and keep what is now the body). That is because
> - in the enable case the value is false and writing false into it is
>   benign,
> - in the disable case you want to actively change from true to
>   false.
> 
 @@ -4550,7 +4554,14 @@ static int do_altp2m_op(
  
  if ( ostate )
  p2m_flush_altp2m(d);
 +else
 +/*
 + * Wait until altp2m_vcpu_initialise() is done before 
 marking
 + * altp2m as being enabled for the domain.
 + */
 +d->arch.altp2m_active = true;
>>>
>>> Similarly here you could omit the "else" and simply store "nstate" afaict.
>>
>> As above, in the enable case I wanted to first setup altp2m on all VCPUs
>> with altp2m_vcpu_initialise(), and only after that set
>> d->arch.altp2m_active = true.
>>
>> In summary, if ( ostate ) we want to set d->arch.altp2m_active before
>> the for (we're disabling altp2m), and if ( !ostate ) (which is the else
>> above) we want to set d->arch.altp2m_active after said for.
>>
>> We can indeed store nstate. I just thought things look clearer with
>> "true" and "false", but if you prefer there's no problem assigning nstate.
> 
> Well, as always with mm and altp2m code, I'm not the maintainer, so
> I don't have the final say. It's just that I think unnecessary conditionals
> would better be avoided, even if they're not on a performance critical
> path.
> 
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
  
 -static void vmx_vcpu_update_eptp(struct vcpu *v)
 +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
  
 -if ( altp2m_active(d) )
 +if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);
>>>
>>> Is this an appropriate transformation? That is, can there not be
>>> any domains for which altp2m_active() returns false despite
>>> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't
>>> really judge whether index would always be INVALID_ALTP2M in
>>> such a case.)
>>
>> Yes, it should be completely safe (please see below for details).
>>
 --- a/xen/arch/x86/mm/p2m.c
 +++ b/xen/arch/x86/mm/p2m.c
 @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
 unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 -altp2m_vcpu_update_p2m(v);
 +altp2m_vcpu_update_p2m(v, altp2m_active(d));
  }
  rc = 1;
  }
 @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
 unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 -altp2m_vcpu_update_p2m(v);
 +altp2m_vcpu_update_p2m(v, altp2m_active(d));
  }
>>>
>>> In both cases, isn't altp2m_active() going to return true anyway
>>> when we get there (related to the question above)?
>>
>> Yes, it will return true. In order for it to return false,
>> altp2m_vcpu_destroy() would have had to run on that VCPU, which (among
>> other things) calls altp2m_vcpu_reset(), which does v->p2midx =
>> INVALID_ALTP2M.
> 
> Hmm, according to the change further up in this patch, altp2m_active()
> 

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru




Right, I now understand what you meant by removing "if ( ostate )" -
you'd like d->arch.altp2m_active to only be set _after_ the for, for the
enable, as well as for the disable case.


No, certainly not. Exactly because of ...


However, I wanted to keep both if()s to avoid another problem with this
code:
[...]
So for the disable case, I wanted altp2m_active(v->domain) to return
false immediately so that this code won't be triggered. Otherwise,
assuming the following scenario:


... this. Apparently "and keep what is now the body" was still not clear
enough.

Taking your original code, I mean

 if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
 /* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
 d->arch.altp2m_active = false;

 for_each_vcpu( d, v )
 [...]

 if ( ostate )
 p2m_flush_altp2m(d);
 /*
  * Wait until altp2m_vcpu_initialise() is done before marking
  * altp2m as being enabled for the domain.
  */
 d->arch.altp2m_active = nstate;
 }


Ah! Thanks for the clarification.


Now, you're right that p2m_get_altp2m() may hand over a pointer to an
altp2m between the moment where d->arch.altp2m_active is false and the
point where v->p2midx actually becomes INVALID_ALTP2M with my code; but
I think it can be argued that I should rather fix p2m_get_altp2m(), to
return NULL if altp2m_active() == false and keep the if()s (which I've
hopefully been more eloquent about now).


Yes, that looks to be an option, provided it doesn't violate assumptions
elsewhere.


It doesn't, AFAICT. Additionally, it can be argued that if code relies 
on p2m_get_altp2m() returning not-NULL when !altp2m_active(), said code 
is broken.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

p2m_get_altp2m() now returns NULL if !altp2m_active(), to
prevent it from returning a "valid" altp2m pointer between
the moment where d->arch.altp2m_active = false and the
point when v->p2midx actually becomes INVALID_ALTP2M.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 

---
Changes since V2:
 - Removed leftover asm-x86/altp2m.h changes.
 - nstate = !!a.u.domain_state.state; becomes
   nstate = a.u.domain_state.state;
 - Removed the if() and else in do_altp2m_op() as
   recommended by Jan.
 - Using true explicitly instead of altp2m_active(d) for
   altp2m_vcpu_update_p2m() in p2m_switch_vcpu_altp2m_by_id()
   and p2m_switch_domain_altp2m_by_id().
 - Updated patch description.
 - Modified p2m_get_altp2m() to return NULL if !altp2m_active().
---
 xen/arch/x86/hvm/hvm.c| 16 +---
 xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
 xen/arch/x86/mm/altp2m.c  |  4 ++--
 xen/arch/x86/mm/p2m.c |  4 ++--
 xen/include/asm-x86/domain.h  |  2 +-
 xen/include/asm-x86/hvm/hvm.h |  6 +++---
 xen/include/asm-x86/p2m.h |  8 +++-
 7 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..50d896d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
 case HVMOP_altp2m_set_domain_state:
 {
 struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
 
 if ( nestedhvm_enabled(d) )
 {
@@ -4534,12 +4534,15 @@ static int do_altp2m_op(
 }
 
 ostate = d->arch.altp2m_active;
-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = a.u.domain_state.state;
 
 /* If the alternate p2m state has changed, handle appropriately */
-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+d->arch.altp2m_active = false;
+
 for_each_vcpu( d, v )
 {
 if ( !ostate )
@@ -4550,7 +4553,14 @@ static int do_altp2m_op(
 
 if ( ostate )
 p2m_flush_altp2m(d);
+
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = nstate;
 }
+
 break;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
 return !!cpu_has_monitor_trap_flag;
 }
 
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
 {
 struct domain *d = v->domain;
 struct p2m_domain *p2m = NULL;
 struct ept_data *ept;
 
-if ( altp2m_active(d) )
+if ( altp2m_enabled )
 p2m = p2m_get_altp2m(v);
 if ( !p2m )
 p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
 vcpu_altp2m(v).p2midx = 0;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, true);
 
 if ( v != current )
 vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
 
 altp2m_vcpu_reset(v);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, false);
 altp2m_vcpu_update_vmfunc_ve(v);
 
 if ( v != current )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..6f991f8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru

On 2/11/19 12:10 PM, Jan Beulich wrote:

On 11.02.19 at 10:13,  wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
  
-static void vmx_vcpu_update_eptp(struct vcpu *v)

+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
  
-if ( altp2m_active(d) )

+if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);


With the change you now make to p2m_get_altp2m(), this looks to be
a benign change. Which to me would suggest to either leave the code
alone, or to drop the if() (but - again - not its body) altogether. At
which point the code could be further streamlined, as then the NULL
initializer can go away and the assignment (or then perhaps initializer)
could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)".
(Generally I'd recommend to leave out the change here, and do the
transformation in a follow-on patch.)


Thanks for noticing, actually this appears to invalidate the whole 
purpose of the patch (I should have tested this more before sumbitting 
V3, sorry).


The whole point of the new boolean is to have p2m assigned an altp2m 
regardless of altp2m_active() (hence the change) - which now no longer 
happens. I got carried away with this change.


The fact that this is so easy to get wrong is the reason why I've 
preferred the domain_pause() solution. There appears to be no clean way 
to fix this otherwise, and if this is so easy to misunderstand it'll 
break just as easily with further changes.


I suppose I could just pass the bool along to p2m_get_altp2m() (and 
indeed remove the if())...



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru

On 2/11/19 12:57 PM, Razvan Cojocaru wrote:

On 2/11/19 12:10 PM, Jan Beulich wrote:

On 11.02.19 at 10:13,  wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
-    if ( altp2m_active(d) )
+    if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);


With the change you now make to p2m_get_altp2m(), this looks to be
a benign change. Which to me would suggest to either leave the code
alone, or to drop the if() (but - again - not its body) altogether. At
which point the code could be further streamlined, as then the NULL
initializer can go away and the assignment (or then perhaps initializer)
could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)".
(Generally I'd recommend to leave out the change here, and do the
transformation in a follow-on patch.)


Thanks for noticing, actually this appears to invalidate the whole 
purpose of the patch (I should have tested this more before sumbitting 
V3, sorry).


The whole point of the new boolean is to have p2m assigned an altp2m 
regardless of altp2m_active() (hence the change) - which now no longer 
happens. I got carried away with this change.


The fact that this is so easy to get wrong is the reason why I've 
preferred the domain_pause() solution. There appears to be no clean way 
to fix this otherwise, and if this is so easy to misunderstand it'll 
break just as easily with further changes.


I suppose I could just pass the bool along to p2m_get_altp2m() (and 
indeed remove the if())...


I think the best that can be done here is to check if altp2m_active() 
early in p2m_switch_vcpu_altp2m_by_id() and 
p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since 
these are only called by HVMOP_altp2m_* (and thus serialized by the 
domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.


This of course means reverting p2m_get_altp2m() to its original 
non-intuitive state of returning a valid altp2m pointer even when 
altp2m_active() is false.


I see no other way out of this (aside from the domain_pause() fix).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru
On 2/11/19 6:59 PM, Jan Beulich wrote:
>>> Thanks for noticing, actually this appears to invalidate the whole 
>>> purpose of the patch (I should have tested this more before sumbitting 
>>> V3, sorry).
>>>
>>> The whole point of the new boolean is to have p2m assigned an altp2m 
>>> regardless of altp2m_active() (hence the change) - which now no longer 
>>> happens. I got carried away with this change.
>>>
>>> The fact that this is so easy to get wrong is the reason why I've 
>>> preferred the domain_pause() solution. There appears to be no clean way 
>>> to fix this otherwise, and if this is so easy to misunderstand it'll 
>>> break just as easily with further changes.
>>>
>>> I suppose I could just pass the bool along to p2m_get_altp2m() (and 
>>> indeed remove the if())...
>>
>> I think the best that can be done here is to check if altp2m_active() 
>> early in p2m_switch_vcpu_altp2m_by_id() and 
>> p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since 
>> these are only called by HVMOP_altp2m_* (and thus serialized by the 
>> domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.
> 
> I'm confused: Where do you see the domain lock used there?
> Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
> any HVMOP_altp2m_* at all. One of the actual callers is guarded
> by altp2m_active(), but the other isn't.

do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and
unlocks it before it exits.

But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any
HVMOP_altp2m_*, I've misread that. Hence, I believe both callers
ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*.

Would you like me to add the altp2m_active() check in both
p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and
make it harder to race (it still won't be impossible, since the bool may
become false between the check and the actual function logic for
p2m_switch_vcpu_altp2m_by_id(), as you've noticed)?

>> I see no other way out of this (aside from the domain_pause() fix).
> 
> If only that one would have been a complete fix, rather than just a
> partial one.

Agreed, but that one at least clearly fixes the external case, whereas
this doesn't seem to cover all corner cases for any situation.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/12/19 9:31 AM, Jan Beulich wrote:

On 11.02.19 at 18:21,  wrote:

On 2/11/19 6:59 PM, Jan Beulich wrote:

Thanks for noticing, actually this appears to invalidate the whole
purpose of the patch (I should have tested this more before sumbitting
V3, sorry).

The whole point of the new boolean is to have p2m assigned an altp2m
regardless of altp2m_active() (hence the change) - which now no longer
happens. I got carried away with this change.

The fact that this is so easy to get wrong is the reason why I've
preferred the domain_pause() solution. There appears to be no clean way
to fix this otherwise, and if this is so easy to misunderstand it'll
break just as easily with further changes.

I suppose I could just pass the bool along to p2m_get_altp2m() (and
indeed remove the if())...


I think the best that can be done here is to check if altp2m_active()
early in p2m_switch_vcpu_altp2m_by_id() and
p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since
these are only called by HVMOP_altp2m_* (and thus serialized by the
domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.


I'm confused: Where do you see the domain lock used there?
Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
any HVMOP_altp2m_* at all. One of the actual callers is guarded
by altp2m_active(), but the other isn't.


do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and
unlocks it before it exits.


But that's not the "domain lock". This is just making sure the domain
won't disappear behind your back.


But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any
HVMOP_altp2m_*, I've misread that. Hence, I believe both callers
ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*.

Would you like me to add the altp2m_active() check in both
p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and
make it harder to race (it still won't be impossible, since the bool may
become false between the check and the actual function logic for
p2m_switch_vcpu_altp2m_by_id(), as you've noticed)?


Well, altp2m being Tech Preview, any partial fix _could_ do in
principle. But personally I dislike such half hearted approaches,
and hence I'd recommend to address the issue in full, i.e.
eliminate race potential altogether. In the end you're going to
be bitten more than me by not getting this into proper shape.


I'll attempt a V4 doing my best to check altp2m_active() then, and see 
if that's at least satisfactory for sane use-cases.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/11/19 6:59 PM, Jan Beulich wrote:

Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
any HVMOP_altp2m_* at all. One of the actual callers is guarded
by altp2m_active(), but the other isn't.


Actually I see that both places are guarded by altp2m_active().

In p2m.c we have:

2312 void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
2313 {
2314 if ( altp2m_active(v->domain) )
2315 p2m_switch_vcpu_altp2m_by_id(v, idx);
2316 }

and in vmx.c:

2225 static int vmx_vcpu_emulate_vmfunc(const struct cpu_user_regs *regs)
2226 {
2227 int rc = X86EMUL_EXCEPTION;
2228 struct vcpu *curr = current;
2229
2230 if ( !cpu_has_vmx_vmfunc && altp2m_active(curr->domain) &&
2231  regs->eax == 0 &&
2232  p2m_switch_vcpu_altp2m_by_id(curr, regs->ecx) )
2233 rc = X86EMUL_OKAY;
2234
2235 return rc;
2236 }

here there's an "&& altp2m_active(curr->domain)" in the if().

So I suppose in our scenario all that's needed it a similar check here:

4636 case HVMOP_altp2m_switch_p2m:
4637 rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
4638 break;

for the other function, p2m_switch_domain_altp2m_by_id().

Unless I'm missing something.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 

---
Changes since V3:
 - Reverted p2m_get_altp2m() to its original state.
 - Added an altp2m_active() check to HVMOP_altp2m_switch_p2m.
 - Updated patch description.
---
 xen/arch/x86/hvm/hvm.c| 20 
 xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
 xen/arch/x86/mm/altp2m.c  |  4 ++--
 xen/arch/x86/mm/p2m.c |  4 ++--
 xen/include/asm-x86/domain.h  |  2 +-
 xen/include/asm-x86/hvm/hvm.h |  6 +++---
 6 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..21ec059 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
 case HVMOP_altp2m_set_domain_state:
 {
 struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
 
 if ( nestedhvm_enabled(d) )
 {
@@ -4534,12 +4534,15 @@ static int do_altp2m_op(
 }
 
 ostate = d->arch.altp2m_active;
-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = a.u.domain_state.state;
 
 /* If the alternate p2m state has changed, handle appropriately */
-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+d->arch.altp2m_active = false;
+
 for_each_vcpu( d, v )
 {
 if ( !ostate )
@@ -4550,7 +4553,14 @@ static int do_altp2m_op(
 
 if ( ostate )
 p2m_flush_altp2m(d);
+
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = nstate;
 }
+
 break;
 }
 
@@ -4624,7 +4634,9 @@ static int do_altp2m_op(
 break;
 
 case HVMOP_altp2m_switch_p2m:
-rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
+rc = -EOPNOTSUPP;
+if ( altp2m_active(d) )
+rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
 break;
 
 case HVMOP_altp2m_set_suppress_ve:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
 return !!cpu_has_monitor_trap_flag;
 }
 
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
 {
 struct domain *d = v->domain;
 struct p2m_domain *p2m = NULL;
 struct ept_data *ept;
 
-if ( altp2m_active(d) )
+if ( altp2m_enabled )
 p2m = p2m_get_altp2m(v);
 if ( !p2m )
 p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
 vcpu_altp2m(v).p2midx = 0;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, true);
 
 if ( v != current )
 vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
 
 altp2m_vcpu_reset(v);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, false);
 altp2m_vcpu_update_vmfunc_ve(v);
 
 if ( v != current )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..6f991f8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx)
 atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
 vcpu_altp2m(v).p2midx = idx;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-altp2m_vcpu_update_p2m(v);
+  

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/12/19 2:57 PM, Jan Beulich wrote:

On 12.02.19 at 12:42,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 


FTR - this looks okay to me now. But I don't feel qualified to
give an R-b, so just for the parts where it's applicable
Acked-by: Jan Beulich 


Thanks for the ack and the help!


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version

2019-02-12 Thread Razvan Cojocaru
On 2/12/19 8:13 PM, Tamas K Lengyel wrote:
> On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru
>  wrote:
>>
>> On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>>>> On 19.12.18 at 19:52,  wrote:
>>>> Signed-off-by: Petre Pircalabu 
>>>
>>> An empty description is not helpful. The immediate question is: Why?
>>> We don't do this for other interface versions. I'm unconvinced a
>>> special purpose piece of information like this one belongs into the
>>> rather generic version hypercall.
>>
>> For an introspection application meant to be deployed on several Xen
>> versions without recompiling, it is important to be able to decide at
>> runtime what size and layout the vm_event struct has.
>>
>> Currently this can somewhat be done by associating the current version
>> with the vm_event version, but that is not ideal for obvious reasons.
> 
> We do exactly that in LibVMI and it works fine - care to elaborate
> what problem you have with doing that? There is a 1:1 match between
> any stable Xen version and a vm_event interface version. I don't think
> we will ever have a situation where we bump the vm_event interface
> version but not the Xen release version.

XenServer. Any given version of XenServer may or may not fit the matrix
you're talking about (because some patches are backported, some are not,
etc.). In which case it all becomes very complicated.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-14 Thread Razvan Cojocaru
On 2/14/19 8:06 PM, George Dunlap wrote:
> On 2/12/19 11:42 AM, Razvan Cojocaru wrote:
>> HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
>> on purpose (as it was originally supposed to cater to a in-guest
>> agent, and a domain pausing itself is not a good idea).
>>
>> This can lead to domain crashes in the vmx_vmexit_handler() code
>> that checks if the guest has the ability to switch EPTP without an
>> exit. That code can __vmread() the host p2m's EPT_POINTER
>> (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
>> chance to run altp2m_vcpu_initialise(), but after
>> d->arch.altp2m_active is set).
> 
> Sorry, where exactly does the crash happen?

3655 /*
3656  * If the guest has the ability to switch EPTP without an exit,
3657  * figure out whether it has done so and update the altp2m data.
3658  */
3659 if ( altp2m_active(v->domain) &&
3660 (v->arch.hvm.vmx.secondary_exec_control &
3661 SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
3662 {
3663 unsigned long idx;
3664
3665 if ( v->arch.hvm.vmx.secondary_exec_control &
3666 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
3667 __vmread(EPTP_INDEX, &idx);
3668 else
3669 {
3670 unsigned long eptp;
3671
3672 __vmread(EPT_POINTER, &eptp);
3673
3674 if ( (idx = p2m_find_altp2m_by_eptp(v->domain, eptp)) ==
3675  INVALID_ALTP2M )
3676 {
3677 gdprintk(XENLOG_ERR, "EPTP not found in alternate
p2m list\n");
3678 domain_crash(v->domain);

Right here (at line 3678 in vmx.c).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru

On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).


Sorry to come in here on v4 and suggest changing everything, but I don’t really 
like the solution you have here.  Not setting altp2m to ‘active’ until after 
the vcpus are set up makes sense; but passing this true/false value in seems 
ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled 
between the check in HVMOP_altp2m_switch_p2m and the time we actually call 
altp2m_vcpu_update_p2m()?)

I certainly don’t think domain_pause() should be our go-to solution for race 
avoidance, but in this case it really seems like switching the global p2m for 
every vcpu at once makes the most sense; and trying to safely change this on an 
unpaused domain is not only overly complicated, but probably not what we wanted 
anyway.

p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use 
domain_pause_except_self(); so it seems like not using it for 
altp2m_set_domain_state was probably more of an oversight than an intentional 
decision.  Using that here seems like a more robust solution.

The one issue is that domain_pause_except_self() currently is actually a 
deadlock risk if two different vcpus start it at the same time.  I think the 
attached patch (compile-tested only) should fix this issue; after this patch 
you should be able to use domain_pause_except_self() in altp2m_set_domain_state 
instead.

Does that sound reasonable?


Not only does that sound reasonable, I personally prefer this approach. 
I'll apply this patch and give it a spin.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru

On 2/15/19 3:37 PM, George Dunlap wrote:




On Feb 15, 2019, at 1:24 PM, Jan Beulich  wrote:


On 15.02.19 at 13:52,  wrote:

On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).


Sorry to come in here on v4 and suggest changing everything, but I don’t
really like the solution you have here.  Not setting altp2m to ‘active’ until
after the vcpus are set up makes sense; but passing this true/false value in
seems ugly, and still seems a bit racy (i.e., what if p2m_active() is
disabled between the check in HVMOP_altp2m_switch_p2m and the time we
actually call altp2m_vcpu_update_p2m()?)

I certainly don’t think domain_pause() should be our go-to solution for race
avoidance, but in this case it really seems like switching the global p2m for
every vcpu at once makes the most sense; and trying to safely change this on
an unpaused domain is not only overly complicated, but probably not what we
wanted anyway.

p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use
domain_pause_except_self(); so it seems like not using it for
altp2m_set_domain_state was probably more of an oversight than an intentional
decision.  Using that here seems like a more robust solution.


Ah, I didn't even recall there was such a function. As this now
also allows covering a domain requesting the operation for itself,
I don't mind the pausing approach anymore.


Yeah, I forgot too until I was grepping for “domain_pause” to figure out what 
everyone else was doing. :-)


The one issue is that domain_pause_except_self() currently is actually a
deadlock risk if two different vcpus start it at the same time.  I think the
attached patch (compile-tested only) should fix this issue; after this patch
you should be able to use domain_pause_except_self() in
altp2m_set_domain_state instead.


There's one thing I don't really like here, which is a result of the
(necessary) re-use of the hypercall deadlock mutex: This
certainly poses the risk of getting called from a context where
the lock was already acquired. Therefore I'd like to suggest to
use this lock in a recursive way (here and elsewhere).




And two cosmetic remarks - there's no need to re-specify
__must_check on the function definition, as the function
declaration ought to be in scope anyway. And there's a stray
blank inside the likely() you add.


I don’t see that I added a ‘likely’; there’s one in context, but I don’t see 
any stray blanks there.

The other two points make sense — Razvan, would you be willing to make those 
changes (and test the result, as I haven’t done more than compile-test it)?


Of course, happy to. Just to make sure I understand where we stand: I'll 
try to leave the mutex alone for now and only switch to a recursive one 
if anything blows up.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V5 1/2] altp2m: Prevent deadlocks when a domain performs altp2m operations on itself

2019-02-15 Thread Razvan Cojocaru
domain_pause_except_self() was introduced to allow a domain to pause
itself while doing altp2m operations.  However, as written, it has a
risk fo deadlock if two vcpus enter the loop at the same time.

Luckily, there's already a solution for this: Attempt to call domain's
hypercall_deadlock_mutex, and restart the entire hypercall if you
fail.

Make domain_pause_except_self() attempt to grab this mutex when
pausing itself, returning -ERESTART if it fails.  Have the callers
check for errors and pass the value up.  In both cases, the top-level
do_hvm_op() should DTRT when -ERESTART is returned.

The (necessary) reuse of the hypercall deadlock mutex poses the risk
of getting called from a context where the lock was already acquired
(e.g. someone may (say) call domctl_lock(), then afterwards call
domain_pause_except_self()). However, in the interest of not
overcomplicating things, no changes are made here to the mutex.
Attempted nesting of this lock isn't a security issue, because all
that will happen is that the vcpu will livelock taking continuations.

Signed-off-by: George Dunlap 
Tested-by: Razvan Cojocaru 
---
Release exception justification:
- Fixes a bug in the current code
- Lays the foundation of another fix
- Only affects altp2m, which isn't a supported feature

NB two possible further improvements to put on the list for 4.13:
 - Replace send-interrupt-and-wait-for-each-one loop with
   hvmop_flush_tlb_all()'s more efficient
   send-all-the-interrupts-then-wait loop.
 - Modify hvmop_flush_tlb_all() to use domain_pause_except_self() to
   avoid code duplication
---
CC: George Dunlap 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
CC: "Roger Pau Monné" 
CC: George Dunlap 
CC: Ian Jackson 
CC: Julien Grall 
CC: Konrad Rzeszutek Wilk 
CC: Stefano Stabellini 
CC: Tim Deegan 
---
 xen/arch/x86/mm/p2m.c   | 10 --
 xen/common/domain.c |  8 +++-
 xen/include/xen/sched.h |  2 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..7232dbf 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2530,8 +2530,11 @@ int p2m_destroy_altp2m_by_id(struct domain *d, unsigned 
int idx)
 if ( !idx || idx >= MAX_ALTP2M )
 return rc;
 
-domain_pause_except_self(d);
+rc = domain_pause_except_self(d);
+if ( rc )
+return rc;
 
+rc = -EBUSY;
 altp2m_list_lock(d);
 
 if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
@@ -2561,8 +2564,11 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
unsigned int idx)
 if ( idx >= MAX_ALTP2M )
 return rc;
 
-domain_pause_except_self(d);
+rc = domain_pause_except_self(d);
+if ( rc )
+return rc;
 
+rc = -EINVAL;
 altp2m_list_lock(d);
 
 if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7470cd9..32bca8d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1134,18 +1134,24 @@ int domain_unpause_by_systemcontroller(struct domain *d)
 return 0;
 }
 
-void domain_pause_except_self(struct domain *d)
+int domain_pause_except_self(struct domain *d)
 {
 struct vcpu *v, *curr = current;
 
 if ( curr->domain == d )
 {
+/* Avoid racing with other vcpus which may want to be pausing us */
+if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+return -ERESTART;
 for_each_vcpu( d, v )
 if ( likely(v != curr) )
 vcpu_pause(v);
+spin_unlock(&d->hypercall_deadlock_mutex);
 }
 else
 domain_pause(d);
+
+return 0;
 }
 
 void domain_unpause_except_self(struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d633e1d..edee52d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -839,7 +839,7 @@ static inline int 
domain_pause_by_systemcontroller_nosync(struct domain *d)
 }
 
 /* domain_pause() but safe against trying to pause current. */
-void domain_pause_except_self(struct domain *d);
+int __must_check domain_pause_except_self(struct domain *d);
 void domain_unpause_except_self(struct domain *d);
 
 /*
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

Signed-off-by: Razvan Cojocaru 

---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
CC: "Roger Pau Monné" 
---
 xen/arch/x86/hvm/hvm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 410623d..79c7d81 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4538,6 +4538,10 @@ static int do_altp2m_op(
 break;
 }
 
+rc = domain_pause_except_self(d);
+if ( rc )
+break;
+
 ostate = d->arch.altp2m_active;
 d->arch.altp2m_active = !!a.u.domain_state.state;
 
@@ -4556,6 +4560,8 @@ static int do_altp2m_op(
 if ( ostate )
 p2m_flush_altp2m(d);
 }
+
+domain_unpause_except_self(d);
 break;
 }
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-18 Thread Razvan Cojocaru

On 2/18/19 1:19 PM, Jan Beulich wrote:

On 15.02.19 at 16:41,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

Signed-off-by: Razvan Cojocaru 


Acked-by: Jan Beulich 

Also I assume you mean to have the same justification here as
you have in patch 1, as to 4.12 inclusion. And for this reason
I'm also Cc-ing Jürgen again.


Ideed, I do. Thank you very much.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 0/6] Slotted channels for sync vm_events

2019-02-19 Thread Razvan Cojocaru

On 2/12/19 7:01 PM, Tamas K Lengyel wrote:

On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
 wrote:


On Thu, 2019-02-07 at 11:46 +, George Dunlap wrote:

On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:

On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:

This patchset is a rework of the "multi-page ring buffer" for
vm_events
patch based on Andrew Cooper's comments.
For synchronous vm_events the ring waitqueue logic was
unnecessary as
the
vcpu sending the request was blocked until a response was
received.
To simplify the request/response mechanism, an array of slotted
channels
was created, one per vcpu. Each vcpu puts the request in the
corresponding slot and blocks until the response is received.

I'm sending this patch as a RFC because, while I'm still working
on
way to
measure the overall performance improvement, your feedback would
be a
great
assistance.



Is anyone still using asynchronous vm_event requests? (the vcpu is
not
blocked and no response is expected).
If not, I suggest that the feature should be removed as it
(significantly) increases the complexity of the current
implementation.


Could you describe in a bit more detail what the situation
is?  What's
the current state fo affairs with vm_events, what you're trying to
change, why async vm_events is more difficult?


The main reason for the vm_events modification was to improve the
overall performance in high throughput introspection scenarios. For
domus with a higher vcpu count, a vcpu could sleep for a certain period
of time while waiting for a ring slot to become available
(__vm_event_claim_slot)
The first patchset only increased the ring size, and the second
iteraton, based on Andrew Copper's comments, proposed a separate path
to handle synchronous events ( a slotted buffer for each vcpu) in order
to have the events handled independently of one another. To handle
asynchronous events, a dynamically allocated vm_event ring is used.
While the implementation is not exactly an exercise in simplicity, it
preserves all the needed functionality and offers fallback if the Linux
domain running the monitor application doesn't support
IOCTL_PRIVCMD_MMAP_RESOURCE.
However, the problem got a little bit more complicated when I tried
implementing the vm_events using an IOREQ server (based on Paul
Durrant's comments). For synchronous vm_events, it simplified things a
little, eliminating both the need for a special structure to hold the
processing state and the evtchns for each vcpu.
The asynchronous events were a little more tricky to handle. The
buffered ioreqs were a good candidate, but the only thing usable is the
corresponding evtchn in conjunction with an existing ring. In order to
use them, a mock buffered ioreq should be created and transmitted, with
the only meaningful field being the ioreq type.


I certainly think it would be better if you could write the new
vm_event
interface without having to spend a lot of effort supporting modes
that
you think nobody uses.

On the other hand, getting into the habit of breaking stuff, even for
people we don't know about, will be a hindrance to community growth;
a
commitment to keeping it working will be a benefit to growth.

But of course, we haven't declared the vm_event interface 'supported'
(it's not even mentioned in the SUPPORT.md document yet).

Just for the sake of discussion, would it be possible / reasonble,
for
instance, to create a new interface, vm_events2, instead?  Then you
could write it to share the ioreq interface without having legacy
baggage you're not using; we could deprecate and eventually remove
vm_events1, and if anyone shouts, we can put it back.

Thoughts?

  -George

Yes, it's possible and it will GREATLY simplify the implementation. I
just have to make sure the interfaces are mutually exclusive.


I'm for removing features from the vm_event interface that are no
longer in use, especially if they block more advantageous changes like
this one. We don't know what the use-case was for async events nor
have seen anyone even mention them since I've been working with Xen.
Creating a new interface, as mentioned above, would make sense if
there was a disagreement with retiring this feature. I don't think
that's the case. I certainly would prefer not having to maintain two
separate interfaces going forward without a clear justification and
documented use-case explaining why we keep the old interface around.


AFAICT, the async model is broken conceptually as well, so it makes no 
sense. It would make sense, IMHO, if it would be lossy (i.e. we just 
write in the ring buffer, if somebody manages to "catch" an event while 
it flies by then so be it, if not it will be overwritten). If it would 
have been truly lossy, I'd see a use-case for it, for gathering 
statistics maybe.


However, as the code is now, the VCPUs are not paused only if there's 
still space in the ring buffer. If there's no more space in the ring 
buffer, the VCPU trying to put an event in still gets pause

Re: [Xen-devel] [PATCH 1/4] xen/common: Break domain_unmap_resources() out of domain_kill()

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> A subsequent change is going to need an x86-specific unmapping step, so take
> the opportunity to split the current vcpu unmapping out into a dedicated path.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Juergen Gross 
> ---
>  xen/common/domain.c  | 16 +---
>  xen/include/xen/domain.h |  4 
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 32bca8d..e66f7ea 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -700,10 +700,21 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, 
> struct domain **d)
>  return 0;
>  }
>  
> +static void domain_unmap_resources(struct domain *d)
> +{
> +struct vcpu *v;
> +
> +for_each_vcpu ( d, v )
> +{
> +unmap_vcpu_info(v);
> +
> +arch_vcpu_unmap_resources(v);
> +}
> +}
> +
>  int domain_kill(struct domain *d)
>  {
>  int rc = 0;
> -struct vcpu *v;
>  
>  if ( d == current->domain )
>  return -EINVAL;
> @@ -732,13 +743,12 @@ int domain_kill(struct domain *d)
>  d->tmem_client = NULL;
>  /* fallthrough */
>  case DOMDYING_dying:
> +domain_unmap_resources(d);
>  rc = domain_relinquish_resources(d);
>  if ( rc != 0 )
>  break;
>  if ( cpupool_move_domain(d, cpupool0) )
>  return -ERESTART;a
> -for_each_vcpu ( d, v )
> -unmap_vcpu_info(v);

So before this change there was a leak of some sort here? I see that
before this patch unmap_vcpu_info() was called only if rc == 0, but it
is now called unconditionally _before_ domain_relinquish_resources().

This does appear to change the behaviour of the code in the error case.

If that's intended:
Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] x86/altp2m: Rework #VE enable/disable paths

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> Split altp2m_vcpu_{enable,disable}_ve() out of the
> HVMOP_altp2m_vcpu_{enable,disable}_notify marshalling logic.  A future change
> is going to need to call altp2m_vcpu_disable_ve() from the domain_kill() path.
> 
> While at it, clean up the logic in altp2m_vcpu_{initialise,destroy}().
> altp2m_vcpu_reset() has no external callers, so fold it into its two
> callsites.  This in turn allows for altp2m_vcpu_destroy() to reuse
> altp2m_vcpu_disable_ve() rather than opencoding it.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/vmx: Properly flush the TLB when an altp2m is modified

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> Modificaitons to an altp2m mark the p2m as needing flushing, but this was
> never wired up in the return-to-guest path.  As a result, stale TLB entries
> can remain after resuming the guest.
> 
> In practice, this manifests as a missing EPT_VIOLATION or #VE exception when
> the guest subsequently accesses a page which has had its permissions reduced.
> 
> vmx_vmenter_helper() now has 11 p2ms to potentially invalidate, but issuing 11
> INVEPT instructions isn't clever.  Instead, count how many contexts need
> invalidating, and use INVEPT_ALL_CONTEXT if two or more are in need of
> flushing.
> 
> This doesn't have an XSA because altp2m is not yet a security-supported
> feature.
> 
> Signed-off-by: Andrew Cooper 
Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page

2019-02-20 Thread Razvan Cojocaru

On 2/20/19 12:18 AM, Andrew Cooper wrote:

The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is
dangerous.  After #VE has been set up, the guest can balloon out and free the
nominated GFN, after which the processor may write to it.  Also, the unlocked
GFN query means the MFN is stale by the time it is used.  Alternatively, a
guest can race two disable calls to cause one VMCS to still reference the
nominated GFN after the tracking information was dropped.

Rework the logic from scratch to make it safe.

Hold an extra page reference on the underlying frame, to account for the
VMCS's reference.  This means that if the GFN gets ballooned out, it isn't
freed back to Xen until #VE is disabled, and the VMCS no longer refers to the
page.

A consequence of this is that arch_vcpu_unmap_resources() now needs to call
altp2m_vcpu_disable_ve() to drop the reference during domain_kill(), to allow
all of the memory to be freed.

For domains using altp2m, we expect a single enable call and no disable for
the remaining lifetime of the domain.  However, to avoid problems with
concurrent calls, use cmpxchg() to locklessly maintain safety.

This doesn't have an XSA because altp2m is not yet a security-supported
feature.

Signed-off-by: Andrew Cooper


Tested-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page

2019-02-21 Thread Razvan Cojocaru
On 2/21/19 10:18 PM, Andrew Cooper wrote:
> The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is
> dangerous.  After #VE has been set up, the guest can balloon out and free the
> nominated GFN, after which the processor may write to it.  Also, the unlocked
> GFN query means the MFN is stale by the time it is used.  Alternatively, a
> guest can race two disable calls to cause one VMCS to still reference the
> nominated GFN after the tracking information was dropped.
> 
> Rework the logic from scratch to make it safe.
> 
> Hold an extra page reference on the underlying frame, to account for the
> VMCS's reference.  This means that if the GFN gets ballooned out, it isn't
> freed back to Xen until #VE is disabled, and the VMCS no longer refers to the
> page.
> 
> A consequence of this is that altp2m_vcpu_disable_ve() needs to be called
> during the domain_kill() path, to drop the reference for domains which shut
> down with #VE still enabled.
> 
> For domains using altp2m, we expect a single enable call and no disable for
> the remaining lifetime of the domain.  However, to avoid problems with
> concurrent calls, use cmpxchg() to locklessly maintain safety.
> 
> This doesn't have an XSA because altp2m is not yet a security-supported
> feature.
> 
> Signed-off-by: Andrew Cooper 
> Release-acked-by: Juergen Gross 

Tested-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   3   4   5   6   >