Re: [Xen-devel] [PATCH] x86/HVM: fix hvmemul_rep_outs_set_context()
On 11/23/2017 05:09 PM, Jan Beulich wrote: > There were two issues with this function: Its use of > hvmemul_do_pio_buffer() was wrong (the function deals only with > individual port accesses, not repeated ones, i.e. passing it > "*reps * bytes_per_rep" does not have the intended effect). And it > could have processed a larger set of operations in one go than was > probably intended (limited just by the size that xmalloc() can hand > back). > > By converting to proper use of hvmemul_do_pio_buffer(), no intermediate > buffer is needed at all. As a result a preemption check is being added. > > Also drop unused parameters from the function. > > Signed-off-by: Jan Beulich Thank you for the patch! FWIW, Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/hvm: Add MSR old value
> On Fri, Oct 13, 2017 at 03:50:57PM +0300, Alexandru Isaila wrote: >> This patch adds the old value param and the onchangeonly option >> to the VM_EVENT_REASON_MOV_TO_MSR event. >> >> The param was added to the vm_event_mov_to_msr struct and to the >> hvm_monitor_msr function. Finally I've changed the bool_t param >> to a bool for the hvm_msr_write_intercept function. >> >> Signed-off-by: Alexandru Isaila >> Acked-by: Tamas K Lengyel >> >> --- >> Changes since V1: >> - Removed Stray blanks inside the inner parentheses >> - Added space after the if statement >> - Added * 8 to the set/clear/test_bit statements >> - Removed the blank line after monitored_msr. >> --- >> tools/libxc/include/xenctrl.h | 2 +- >> tools/libxc/xc_monitor.c | 3 ++- > > Acked-by: Wei Liu Ping - AFAICT this patch has all the required acks? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 22/25] x86/HVM: do actual CMPXCHG in hvmemul_cmpxchg()
On 12/07/2017 04:16 PM, Jan Beulich wrote: > ..., at least as far as currently possible, i.e. when a mapping can be > obtained. > > Signed-off-by: Jan Beulich Thank you for the patch! ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages
On 12/08/2017 02:18 PM, Jan Beulich wrote: On 24.10.17 at 12:19, wrote: >> HVMOP_altp2m_set_mem_access_multi has been added as a HVMOP (as opposed to a >> DOMCTL) for consistency with its HVMOP_altp2m_set_mem_access counterpart (and >> hence with the original altp2m design, where domains are allowed - with the >> proper altp2m access rights - to alter these settings), in the absence of an >> official position on the issue from the original altp2m designers. > > I continue to disagree with this reasoning. I'm afraid I'm not really > willing to allow widening the badness, unless altp2m was formally > documented security-unsupported. Going the DOMCTL route here would have been the (much easier) solution, and in fact, as stated before, there has been an attempt to do so - however, IIRC Andrew has insisted that we should take care to use consistent access privilege across altp2m operations. This was followed by a lengthy xen-devel discussion and several unsuccessful attempts to obtain an official position from the original contributors, at which point (after several months), as also discussed at the Xen Developer Summit in Budapest, we decided to press on in the direction that had seemed the most compatible with the original altp2m design. (Please correct me if I'm misremembering or misunderstanding something.) So at this point it looks like we're stuck again: we're happy to go in any direction the maintainers decide is the best, but we do need to decide on one. FWIW, Tamas (CC added) has added code to restrict where altp2m calls can come from (although that's not XSM code). Please let us know how to proceed. >> +int xc_altp2m_set_mem_access_multi(xc_interface *xch, uint32_t domid, >> + uint16_t view_id, uint8_t *access, >> + uint64_t *pages, uint32_t nr) >> +{ >> +int rc; >> + >> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg); >> +DECLARE_HYPERCALL_BOUNCE(access, nr, XC_HYPERCALL_BUFFER_BOUNCE_IN); >> +DECLARE_HYPERCALL_BOUNCE(pages, nr * sizeof(uint64_t), >> + XC_HYPERCALL_BUFFER_BOUNCE_IN); > > This is confusing: Why "* sizeof()" in the latter expression, but not > in the former. It would anyway be better to use sizeof(*pages) (and > then also sizeof(*access)) to clearly make the connection. I've opted for less clutter here, since of course sizeof(uint8_t) == 1, so nr == nr * sizeof(uint8_t). But we're happy to make the change, it will indeed make the code clearer. >> @@ -4568,6 +4571,37 @@ static int do_altp2m_op( >> a.u.set_mem_access.view); >> break; >> >> +case HVMOP_altp2m_set_mem_access_multi: >> +if ( a.u.set_mem_access_multi.pad || >> + a.u.set_mem_access_multi.opaque >= a.u.set_mem_access_multi.nr >> ) >> +{ >> +rc = -EINVAL; >> +break; >> +} > > Just like in a number of other recent cases: This operation should not > fail when .nr is zero. Yet as per above it will. We'll test that .nr != 0 before the >= comparison. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages
On 12/11/2017 03:36 PM, Jan Beulich wrote: On 11.12.17 at 13:50, wrote: >> On 12/11/2017 12:12 PM, Jan Beulich wrote: >> On 11.12.17 at 12:06, wrote: My suggestion was that we don't break usecases. The Intel usecase specifically is for an in-guest entity to have full control of all altp2m functionality, and this is fine (security wise) when permitted to do so by the toolstack. >>> >>> IOW you mean that such guests would be considered "trusted", i.e. >>> whatever bad they can do is by definition not a security concern. >> >> I'm not sure what you mean by "trusted". If implemented correctly, >> altp2m and mem_access shouldn't give the guest any more permissions than >> it has already. The main risk would be if there were bugs in the >> functionality that allowed security issues. > > Hmm, maybe I'm mis-reading the code, but > mem_access.c:set_mem_access() looks to be using the requested > access rights verbatim, i.e. without applying tool stack imposed > restrictions (hypervisor ones look to be honored by deriving > base permissions from the p2m type first). Quite likely I'm not grasping the full meaning of your objection, however the added code is merely another interface to already existing core code - so while admittedly there's room for improvement for the EPT code below it, this patch really only extends the scope of altp2m's existing version of set_mem_access() (which currently works on a single page). In that, it at least doesn't seem to make things worse (it's really just an optimization - whatever badness this code can cause with a single call, can already be achieved exactly with a sequence of xc_altp2m_set_mem_access() calls). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages
On 12/11/2017 05:54 PM, George Dunlap wrote: > On 12/11/2017 03:05 PM, Jan Beulich wrote: >>>>> On 11.12.17 at 15:51, wrote: >>> On 12/11/2017 02:46 PM, Razvan Cojocaru wrote: >>>> On 12/11/2017 03:36 PM, Jan Beulich wrote: >>>>>>>> On 11.12.17 at 13:50, wrote: >>>>>> On 12/11/2017 12:12 PM, Jan Beulich wrote: >>>>>>>>>> On 11.12.17 at 12:06, wrote: >>>>>>>> My suggestion was that we don't break usecases. The Intel usecase >>>>>>>> specifically is for an in-guest entity to have full control of all >>>>>>>> altp2m functionality, and this is fine (security wise) when permitted >>>>>>>> to >>>>>>>> do so by the toolstack. >>>>>>> >>>>>>> IOW you mean that such guests would be considered "trusted", i.e. >>>>>>> whatever bad they can do is by definition not a security concern. >>>>>> >>>>>> I'm not sure what you mean by "trusted". If implemented correctly, >>>>>> altp2m and mem_access shouldn't give the guest any more permissions than >>>>>> it has already. The main risk would be if there were bugs in the >>>>>> functionality that allowed security issues. >>>>> >>>>> Hmm, maybe I'm mis-reading the code, but >>>>> mem_access.c:set_mem_access() looks to be using the requested >>>>> access rights verbatim, i.e. without applying tool stack imposed >>>>> restrictions (hypervisor ones look to be honored by deriving >>>>> base permissions from the p2m type first). >>>> >>>> Quite likely I'm not grasping the full meaning of your objection, >>>> however the added code is merely another interface to already existing >>>> core code - so while admittedly there's room for improvement for the EPT >>>> code below it, this patch really only extends the scope of altp2m's >>>> existing version of set_mem_access() (which currently works on a single >>>> page). In that, it at least doesn't seem to make things worse (it's >>>> really just an optimization - whatever badness this code can cause with >>>> a single call, can already be achieved exactly with a sequence of >>>> xc_altp2m_set_mem_access() calls). >>> >>> I think Jan was saying that he would ideally like to remove *all* guest >>> access to altp2m functionality, even what's currently there. The more >>> extra features we make available to guests, the harder it will be in the >>> future to argue to remove it all. >> >> With one slight correction: all _uncontrolled_ access is what I'd like >> to see removed. Right now this could arguably indeed mean all >> access, as it is all uncontrolled (afaict). > > Well at the moment all guest altp2m functionality is disabled unless the > toolstack has set the appropriate HVM param. Is that not sufficient? Furthermore, the parameters allow setting dom0-access only (courtesy of Tamas' aforementioned patches). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8] x86/altp2m: support for setting restrictions for an array of pages
On 12/11/2017 06:00 PM, Tamas K Lengyel wrote: >>> I think Jan was saying that he would ideally like to remove *all* guest >>> access to altp2m functionality, even what's currently there. The more >>> extra features we make available to guests, the harder it will be in the >>> future to argue to remove it all. >> >> With one slight correction: all _uncontrolled_ access is what I'd like >> to see removed. Right now this could arguably indeed mean all >> access, as it is all uncontrolled (afaict). >> > > But it is controlled. Unless you specifically allow the guest access > to the interface (ie altp2m=1 in the xl config) the guest can't do > anything with it. And if you do that, it is likely because you have an > in-guest agent that works in tandem with an out-of-guest agent > coordinating what to protect and how. You use the in-guest agent for > performance-sensitive monitoring while you can use the out-of-guest > agent to protect the in-guest agent. Of course, this is not a > requirement but what I *think* the setup was that the interface was > designed for as there is specific ability to decide which page > permission violation goes to the guest (with #VE) and which goes to > the toolstack. Plus even if the interface is enabled, it is only > available to the guest kernel. If it's a malicious guest kernel the > worst it should be able to do is crash itself (for example by > allocating a ton of altp2m tables and running out of shadow memory). > But I don't think you need the altp2m interface for a guest kernel to > crash itself. That's precisely how we envision possible use cases as well. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate
On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote: > > > On 11.09.2019 12:57, Jan Beulich wrote: >> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote: >>> +/* >>> + * Send memory access vm_events based on pfec. Returns true if the event >>> was >>> + * sent and false for p2m_get_mem_access() error, no violation and event >>> send >>> + * error. Assumes the caller will check arch.vm_event->send_event. >>> + * >>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the >>> EPT >>> + * (in which case access to it is unrestricted, so no violations can >>> occur). >>> + * In this cases it is fine to continue the emulation. >>> + */ >>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec, >>> + uint16_t kind) >> >> Why did you choose to have "ept" in the name and also mention it >> in the commit? Is there anything in here which isn't generic p2m? > > The name was suggested by Razvan Cojocaru. I have no preference in the > name. Maybe Tamas can suggest a good one. I've suggested "ept" in the name because "regular" emulation ignores it, and this function takes it into account, hence the "check_ept" (which I thought would be read together). It's fine to change it to something else. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate
On 9/11/19 2:41 PM, Jan Beulich wrote: > On 11.09.2019 13:21, Razvan Cojocaru wrote: >> On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote: >>> >>> >>> On 11.09.2019 12:57, Jan Beulich wrote: >>>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote: >>>>> +/* >>>>> + * Send memory access vm_events based on pfec. Returns true if the event >>>>> was >>>>> + * sent and false for p2m_get_mem_access() error, no violation and event >>>>> send >>>>> + * error. Assumes the caller will check arch.vm_event->send_event. >>>>> + * >>>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the >>>>> EPT >>>>> + * (in which case access to it is unrestricted, so no violations can >>>>> occur). >>>>> + * In this cases it is fine to continue the emulation. >>>>> + */ >>>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec, >>>>> + uint16_t kind) >>>> >>>> Why did you choose to have "ept" in the name and also mention it >>>> in the commit? Is there anything in here which isn't generic p2m? >>> >>> The name was suggested by Razvan Cojocaru. I have no preference in the >>> name. Maybe Tamas can suggest a good one. >> >> I've suggested "ept" in the name because "regular" emulation ignores it, >> and this function takes it into account, hence the "check_ept" (which I >> thought would be read together). It's fine to change it to something else. > > Then "check_p2m" perhaps? Sounds good to me. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate
On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote: > +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec, > + uint16_t kind) > +{ > +xenmem_access_t access; > +vm_event_request_t req = {}; > +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK)); > + > +ASSERT(current->arch.vm_event->send_event); > + > +current->arch.vm_event->send_event = false; > + > +if ( p2m_get_mem_access(current->domain, gfn, &access, > +altp2m_vcpu_idx(current)) != 0 ) > +return false; ... next to the call here (but the maintainers of the file would have to judge in the end). That said, I continue to not understand why a not found entry means unrestricted access. Isn't it ->default_access which controls what such a "virtual" entry would permit? >>> I'm sorry for this misleading comment. The code states that if entry was >>> not found the access will be default_access and return 0. So in this >>> case the default_access will be checked. >>> >>> /* If request to get default access. */ >>> if ( gfn_eq(gfn, INVALID_GFN) ) >>> { >>>*access = memaccess[p2m->default_access]; >>>return 0; >>> } >>> >>> If this clears thing up I can remove the "NOTE" part if the comment. >> I'm afraid it doesn't clear things up: I'm still lost as to why >> "entry not found" implies "full access". And I'm further lost as >> to what the code fragment above (dealing with INVALID_GFN, but >> not really the "entry not found" case, which would be INVALID_MFN >> coming back from a translation) is supposed to tell me. >> > It is safe enough to consider a invalid mfn from hostp2 to be a > violation. There is still a small problem with having the altp2m view > not having the page propagated from hostp2m. In this case we have to use > altp2m_get_effective_entry(). In the absence of clear guidance from the Intel SDM on what the hardware default is for a page not present in the p2m, we should probably follow Jan's advice and check violations against default_access for such pages. There are indeed - as discussed privately - two cases for an altp2m though: 1. Page not found in the altp2m, but set in the hostp2m - in which case I suggest that we take the hostp2m value into account (with or without propagation to the altp2m; I see no harm in propagating the entry but other may see something I'm missing). 2. Page not found in both altp2m and hostp2m - in which case we should probably check against default_access. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate
On 9/17/19 6:09 PM, Tamas K Lengyel wrote: > On Tue, Sep 17, 2019 at 8:24 AM Razvan Cojocaru > wrote: >> >> On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote: >>>>>>> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec, >>>>>>> + uint16_t kind) >>>>>>> +{ >>>>>>> +xenmem_access_t access; >>>>>>> +vm_event_request_t req = {}; >>>>>>> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK)); >>>>>>> + >>>>>>> +ASSERT(current->arch.vm_event->send_event); >>>>>>> + >>>>>>> +current->arch.vm_event->send_event = false; >>>>>>> + >>>>>>> +if ( p2m_get_mem_access(current->domain, gfn, &access, >>>>>>> +altp2m_vcpu_idx(current)) != 0 ) >>>>>>> +return false; >>>>>> ... next to the call here (but the maintainers of the file would >>>>>> have to judge in the end). That said, I continue to not understand >>>>>> why a not found entry means unrestricted access. Isn't it >>>>>> ->default_access which controls what such a "virtual" entry would >>>>>> permit? >>>>> I'm sorry for this misleading comment. The code states that if entry was >>>>> not found the access will be default_access and return 0. So in this >>>>> case the default_access will be checked. >>>>> >>>>> /* If request to get default access. */ >>>>> if ( gfn_eq(gfn, INVALID_GFN) ) >>>>> { >>>>>*access = memaccess[p2m->default_access]; >>>>>return 0; >>>>> } >>>>> >>>>> If this clears thing up I can remove the "NOTE" part if the comment. >>>> I'm afraid it doesn't clear things up: I'm still lost as to why >>>> "entry not found" implies "full access". And I'm further lost as >>>> to what the code fragment above (dealing with INVALID_GFN, but >>>> not really the "entry not found" case, which would be INVALID_MFN >>>> coming back from a translation) is supposed to tell me. >>>> >>> It is safe enough to consider a invalid mfn from hostp2 to be a >>> violation. There is still a small problem with having the altp2m view >>> not having the page propagated from hostp2m. In this case we have to use >>> altp2m_get_effective_entry(). >> >> In the absence of clear guidance from the Intel SDM on what the hardware >> default is for a page not present in the p2m, we should probably follow >> Jan's advice and check violations against default_access for such pages. > > The SDM is very clear that pages that are not present in the EPT are a > violation: > > 28.2.2 > An EPT paging-structure entry is present if any of bits 2:0 is 1; > otherwise, the entry is not present. The processor > ignores bits 62:3 and uses the entry neither to reference another EPT > paging-structure entry nor to produce a > physical address. A reference using a guest-physical address whose > translation encounters an EPT paging-struc- > ture that is not present causes an EPT violation (see Section 28.2.3.2). > > 28.2.3.2 > EPT Violations > An EPT violation may occur during an access using a guest-physical > address whose translation does not cause an > EPT misconfiguration. An EPT violation occurs in any of the following > situations: > • Translation of the guest-physical address encounters an EPT > paging-structure entry that is not present (see > Section 28.2.2). Mystery solved. Thanks! ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] SVM: correct CPUID event processing
On 9/19/19 11:07 PM, Boris Ostrovsky wrote: > On 9/19/19 6:37 AM, Jan Beulich wrote: >> hvm_monitor_cpuid() expects the input registers, not two of the outputs. >> >> However, once having made the necessary adjustment, the SVM and VMX >> functions are so similar that they should be folded (thus avoiding >> further similar asymmetries to get introduced). Use the best of both >> worlds by e.g. using "curr" consistently. This then being the only >> caller of hvm_check_cpuid_faulting(), fold in that function as well. >> >> Signed-off-by: Jan Beulich > > Reviewed-by: Boris Ostrovsky Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v13] x86/emulate: Send vm_event from emulate
On 9/23/19 3:05 PM, Alexandru Stefan ISAILA wrote: > A/D bit writes (on page walks) can be considered benign by an introspection > agent, so receiving vm_events for them is a pessimization. We try here to > optimize by filtering these events out. > Currently, we are fully emulating the instruction at RIP when the hardware > sees > an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however, > incorrect, because the instruction at RIP might legitimately cause an > EPT fault of its own while accessing a_different_ page from the original one, > where A/D were set. > The solution is to perform the whole emulation, while ignoring EPT > restrictions > for the walk part, and taking them into account for the "actual" emulating of > the instruction at RIP. When we send out a vm_event, we don't want the > emulation > to complete, since in that case we won't be able to veto whatever it is doing. > That would mean that we can't actually prevent any malicious activity, instead > we'd only be able to report on it. > When we see a "send-vm_event" case while emulating, we need to first send the > event out and then suspend the emulation (return X86EMUL_RETRY). > After the emulation stops we'll call hvm_vm_event_do_resume() again after the > introspection agent treats the event and resumes the guest. There, the > instruction at RIP will be fully emulated (with the EPT ignored) if the > introspection application allows it, and the guest will continue to run past > the instruction. > > A common example is if the hardware exits because of an EPT fault caused by a > page walk, p2m_mem_access_check() decides if it is going to send a vm_event. > If the vm_event was sent and it would be treated so it runs the instruction > at RIP, that instruction might also hit a protected page and provoke a > vm_event. > > Now if npfec.kind == npfec_kind_in_gpt and > d->arch.monitor.inguest_pagefault_disabled > is true then we are in the page walk case and we can do this emulation > optimization > and emulate the page walk while ignoring the EPT, but don't ignore the EPT > for the > emulation of the actual instruction. > > In the first case we would have 2 EPT events, in the second case we would have > 1 EPT event if the instruction at the RIP triggers an EPT event. > > We use hvmemul_map_linear_addr() to intercept write access and > __hvm_copy() to intercept exec, read and write access. > > A new return type was added, HVMTRANS_need_retry, in order to have all > the places that consume HVMTRANS* return X86EMUL_RETRY. > > hvm_emulate_send_vm_event() can return false if there was no violation, > if there was an error from monitor_traps() or p2m_get_mem_access(). > -ESRCH from p2m_get_mem_access() is treated as restricted access. > > NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable > arch.vm_event->send_event > > Signed-off-by: Alexandru Isaila FWIW, Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value
On 5/1/19 6:01 PM, Tamas K Lengyel wrote: > On Wed, May 1, 2019 at 8:53 AM Tamas K Lengyel wrote: >> >> On Wed, May 1, 2019 at 8:20 AM Razvan Cojocaru >> wrote: >>> >>> On 5/1/19 4:58 PM, Tamas K Lengyel wrote: >>>>>> It might be worth introducing a "sync state from hw" hook which collects >>>>>> all the data we intend to pass to the introspection agent. >>>>> >>>>> You mean adding another hvm hook? >>>> >>>> Actually, instead of another hook I think what would make sense it to >>>> just update vmx_save_vmcs_ctxt to automatically refresh the cached >>>> register values when it's called with "v == current". Thoughts? >>> >>> That's probably the better way to go about it, since otherwise the >>> xc_hvm_getcontext_partial() hypercall will suffer from the same problem. >>> (there are two ways of getting guest state: one is via the vm_event >>> cached values, the other is via the aforementioned hypercall). >> >> True, although issuing the hypercall in the vm_event callback is >> actually fine - that's how I found the issue to begin with, since the >> vCPU will be scheduled out with the cached registers refreshed and >> thus be different then what the vm_event itself had. But other callers >> of the hypercall can run into the problem if the guest/vcpu is not >> paused. > > Actually, doing the "v == current" check wouldn't really do anything > for the hypercall since it's not the domain issuing the hypercall on > itself. The only way to ensure that hypercall is returning correct > values would be to pause the vCPU. I've discussed this with Andrew in the meantime and he's kindly pointed out that for the hypercall we pause from remote context, which forces a de-schedule. So the hypercall _should_ be fine. But we write data to the vm_event ring from the current context, where state might actually be in hardware. He'll probably chime in with additional suggestions / comments. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 2:52 AM, Tamas K Lengyel wrote: > Currently the gs_shadow value is only cached when the vCPU is being scheduled > out by Xen. Reporting this (usually) stale value through vm_event is > incorrect, > since it doesn't represent the actual state of the vCPU at the time the event > was recorded. This prevents vm_event subscribers from correctly finding kernel > structures in the guest when it is trapped while in ring3. > > Refresh shadow_gs value when the context being saved is for the current vCPU. > > Signed-off-by: Tamas K Lengyel > Cc: Razvan Cojocaru > Cc: Jun Nakajima > Cc: Kevin Tian > Cc: Jan Beulich > Cc: Andrew Cooper > Cc: Wei Liu > Cc: Roger Pau Monne > --- > v2: move fix to hvm so vm_event doesn't have to know specifics > --- > xen/arch/x86/hvm/vmx/vmx.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 283eb7b34d..5154ecc2a8 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct > hvm_hw_cpu *data) > > static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) > { > +if ( v == current ) > +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). But that's not technically an issue, so if nobody else minds: Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; +case x86_seg_gdtr: +reg->gdtr_base = seg.base; +break; Please also add limit, ar, sel, like the others do. In addition to making this modification look less strange (since, in contrast to the function name, nothing is packed for gdtr_base), it will also save future work for applications wanting to use gdtr which also require backwards compatibility with previous vm_event versions. As you know, for each such modification we need to have a separate vm_event_vX header in our applications so that we're ready to create a ring buffer using requests and replies of the right size, and also to be able to properly interpret the ring buffer data, so the least frequent changes to the vm_event struct, the better. Petre is currently working on the vm_event changes that will hopefully enable us to just cache everything that the getcontext_partial hypercall retrieves. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 11:36 AM, Jan Beulich wrote: On 02.05.19 at 08:20, wrote: On 5/2/19 2:52 AM, Tamas K Lengyel wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) { +if ( v == current ) +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). Avoiding an MSR access is perhaps worth the conditional. Fair enough. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 11:45 AM, Andrew Cooper wrote: On 02/05/2019 07:20, Razvan Cojocaru wrote: On 5/2/19 2:52 AM, Tamas K Lengyel wrote: Currently the gs_shadow value is only cached when the vCPU is being scheduled out by Xen. Reporting this (usually) stale value through vm_event is incorrect, since it doesn't represent the actual state of the vCPU at the time the event was recorded. This prevents vm_event subscribers from correctly finding kernel structures in the guest when it is trapped while in ring3. Refresh shadow_gs value when the context being saved is for the current vCPU. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Jun Nakajima Cc: Kevin Tian Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- v2: move fix to hvm so vm_event doesn't have to know specifics --- xen/arch/x86/hvm/vmx/vmx.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 283eb7b34d..5154ecc2a8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) { +if ( v == current ) +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). Why? Doing that would fully corrupt v's state when called in remote context, as you'd clobber v's gs_shadow which whatever the value was from the current vcpu. Good point, I've missed that. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 11:52 AM, Andrew Cooper wrote: On 02/05/2019 09:25, Razvan Cojocaru wrote: On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; + case x86_seg_gdtr: + reg->gdtr_base = seg.base; + break; Please also add limit, ar, sel, like the others do. In Xen, we model GDTR/IDTR just like all other segments in the segment cache for convenience/consistency, including faking up of some default attributes. However, there is no such thing as a selector or access rights for them, and the VMCS lacks the fields, while the VMCB marks the fields as reserved. It is almost certainly not worth wasting the space in the ring. Right, I got carried away there: I saw gdtr_limit and didn't check that if the rest is available. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 4:09 PM, Tamas K Lengyel wrote: On Thu, May 2, 2019 at 2:57 AM Jan Beulich wrote: On 02.05.19 at 10:25, wrote: On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; +case x86_seg_gdtr: +reg->gdtr_base = seg.base; +break; Please also add limit, ar, sel, like the others do. There's no ar or sel for GDT (and IDT). Instead, because of ... In addition to making this modification look less strange (since, in contrast to the function name, nothing is packed for gdtr_base), it will also save future work for applications wanting to use gdtr which also require backwards compatibility with previous vm_event versions. As you know, for each such modification we need to have a separate vm_event_vX header in our applications so that we're ready to create a ring buffer using requests and replies of the right size, and also to be able to properly interpret the ring buffer data, so the least frequent changes to the vm_event struct, the better. ... this I wonder whether the IDT details shouldn't get added at the same time. The churn of the header is not that big of a deal - I do the same in LibVMI and it's a largely copy-paste job that takes perhaps ~5m to add (see https://github.com/tklengyel/libvmi/commit/7509ce56d0408dbec4e374b8780da69a7bd212e8). So that should not be a factor in deciding whether we add a needed extra piece or not. Since the vm_event ABI is changing for Xen 4.13 now the ABI version will only be bumped once for 4.13. So we should be able to add/remove/shuffle whatever we want while 4.13 merge window is open. That said I don't have a use for idt and gdtr_limit that warrants having to receive it via the vm_event structure - those pieces are always just a hypercall away if needed. So in the vm_event structure I tend to just put the registers needed often enough to warrant avoiding that extra hypercall. But if Razvan says he has a use for them, I can add them here. We do use them both - idtr and gdtr, both base and limit, but we are indeed getting them via hypercall now. Considering that, since we did add gdtr_base I think it would be great if we could also add gdtr_limit. Adding idtr as well would _theoretically_ be a nice speed optimization (removing the need for a hypercall), but it might also be a speed pessimization generally applicable to increasing the vm_event size (since a VCPU with no more space left in the vm_event ring buffer will be blocked more and go through more locking logic). My point is, at the moment it's fine to skip idtr if it's not required by Jan, but if we do add either then let's please add both _base and _limit. In a hopefully near future we'll be able to use as much memory as necessary for storing vm_event data and just cache everything in the ring buffer, avoing all the "get context" hypercalls. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vm_event: add gdtr_base to the vm_event structure
On 5/3/19 12:42 AM, Tamas K Lengyel wrote: > Receiving this register is useful for introspecting 32-bit Windows when the > event being trapped happened while in ring3. > > Signed-off-by: Tamas K Lengyel > Cc: Razvan Cojocaru > Cc: Tamas K Lengyel > Cc: Jan Beulich > Cc: Andrew Cooper > Cc: Wei Liu > Cc: Roger Pau Monne > --- > v2: add gdtr limit Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] x86/vm_event: add gdtr_base to the vm_event structure
On 5/3/19 11:09 AM, Jan Beulich wrote: On 03.05.19 at 00:54, wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- v2: add gdtr limit v3: use uint32_t to fit the 20 bits As per Andrew's response I think v2 is it. Yes, please. This will also allow us to reuse the existing (remaining) pad bits in the future for another limit (for idtr, perhaps). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/9/19 11:57 PM, Mathieu Tarral wrote: > Hi, > > following a previous conversation, i would like to catch MOV-TO-DRx VMI > events to prevent the guest from disabling my hardware breakpoints. > > @Tamas pointed me to this header: > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/vm_event.h;h=b2bafc0d77f9758e42b0d53c05a7e6bb86c86686;hb=HEAD#l154 > > And, as far as I can tell, I have to > - add a new REASON > #define VM_EVENT_REASON_WRITE_DEBUGREG 15 > > - add a new struct > struct vm_event_write_debugreg { > uint32_t index; > uint32_t _pad; > uint64_t new_value; > uint64_t old_value; > }; > > - insert it into the vm_event_st union > > Can you give me more pointer and guidance how to implement this into Xen ? You probably want to change the write_debugreg() macro into a function that does what's currently being done + send out the vm_event. You also probably need to think about whether you want the write to be preemptable or not (I'm guessing you don't, in which case it's all simpler). > I have never submitted a patch, nor looked into Xen source code. https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches > Should we create a single event for MOV-TO-REGx VMI events ? > It would void copy pasting and duplicating code. I don't understand this question. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/10/19 12:31 AM, Andrew Cooper wrote: > What we'll have to do is end up in a position where we can have some > real %dr settings given by the VMI agent, and some shadow %dr settings > which the guest interacts with. Also I should warn you at this point > that, because of how the registers work, It will not be possible to have > guest-shadowed %dr functioning at the same time as VMI-provided %dr > settings. > > I guess the main usecase here is simply hiding from the guest kernel > that debugging activities are in use, and we are ok to break the real > use of gdb/other inside the guest? Razvan/Tamas: As your the > maintainers, it is your call, ultimately. What worries me here is that in that case it becomes easier for a rogue application inside the guest to figure out that the guest's being monitored, if I understand things correctly. Of course, a dom0 introspection agent may choose to simply not subscribe to DR events, and thus not alter the current flow at all, which makes things better. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/10/19 2:03 AM, Tamas K Lengyel wrote: >> Either way, everything comes down to what behaviour is wanted to start with. > As I said, I think adding that monitoring capability is fine as long > as its limitation is clearly documented. Right. My thoughts as well. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Altp2m use with PML can deadlock Xen
On 5/10/19 5:42 PM, Tamas K Lengyel wrote: On Thu, May 9, 2019 at 10:19 AM Andrew Cooper wrote: On 09/05/2019 14:38, Tamas K Lengyel wrote: Hi all, I'm investigating an issue with altp2m that can easily be reproduced and leads to a hypervisor deadlock when PML is available in hardware. I haven't been able to trace down where the actual deadlock occurs. The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer that calls p2m_change_type_one on all gfns that were recorded the PML buffer. The problem occurs when the PML buffer full vmexit happens while the active p2m is an altp2m. Switching p2m_change_type_one to work with the altp2m instead of the hostp2m however results in EPT misconfiguration crashes. Adding to the issue is that it seem to only occur when the altp2m has remapped GFNs. Since PML records entries based on GFN leads me to question whether it is safe at all to use PML when altp2m is used with GFN remapping. However, AFAICT the GFNs in the PML buffer are not the remapped GFNs and my understanding is that it should be safe as long as the GFNs being tracked by PML are never the remapped GFNs. Booting Xen with ept=pml=0 resolves the issue. If anyone has any insight into what might be happening, please let me know. I could have sworn that George spotted a problem here and fixed it. I shouldn't be surprised if we have more. The problem that PML introduced (and this is mostly my fault, as I suggested the buggy solution) is that the vmexit handler from one vcpu pauses others to drain the PML queue into the dirty bitmap. Overall I wasn't happy with the design and I've got some ideas to improve it, but within the scope of how altp2m was engineered, I proposed domain_pause_except_self(). As it turns out, that is vulnerable to deadlocks when you get two vcpus trying to pause each other and waiting for each other to become de-scheduled. Makes sense. I see this has been reused by the altp2m code, but it *should* be safe to deadlocks now that it takes the hypercall_deadlock_mutext. Is that already in staging or your x86-next branch? I would like to verify that the problem is still present or not with that change. I tested with Xen 4.12 release and that definitely still deadlocks. I don't know if Andrew is talking about this patch (probably not, but it looks at least related): http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=24d5282527f4647907b3572820b5335c15cd0356;hp=29d28b29190ba09d53ae7e475108def84e16e363 Since there's a "Release-acked" tag on it, I think it's in 4.12. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/13/19 5:06 PM, Jan Beulich wrote: On 13.05.19 at 15:58, wrote: On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? You're kidding? This is already in 4.12.0, and if it weren't I'm sure you're aware there are about 40 more AVX512 patches pending review. Right, I did indeed forget about the pending review part, for some reason I was sure they made it in. I've double-checked and we really are using 4.13-unstable - but we've also made changes to the emulator, working on the send-vm-events-from-the-emulator patch, so we'll revert to a pristine staging and retry, there's a chance this happens because of our changes. We'll find out what's going on exactly. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
On 5/13/19 7:18 PM, Mathieu Tarral wrote: > Le vendredi, mai 10, 2019 5:21 PM, Andrew Cooper > a écrit : > >> On 10/05/2019 16:17, Mathieu Tarral wrote: >> >>> Le jeudi, mai 9, 2019 6:42 PM, Andrew Cooper andrew.coop...@citrix.com a >>> écrit : >>> Therefore, the conclusion to draw is that it is a logical bug somewhere. >>> The bug is still here, so we can exclude a microcode issue. >> >> Good - that is one further angle excluded. Always make sure you are >> running with up-to-date microcode, but it looks like we back to >> investigating a logical bug in libvmi or Xen. > > I played with tool/tests/xen-access this afternoon. > > The tool is working, i could intercept breakpoints, cpuid, write and exec mem > accesses, etc.. > > However, using altp2m related intercepts leads to a guest crash sometimes: > > Windows 7 x64, 4 VCPUs > - altp2m_write: crash > - altp2m_exec: crash > - altp2m_write_no_gpt: frozen > > Windows 7 x64, 1 VCPU > - altp2m_write: crash > - altp2m_exec: OK > - altp2m_write_no_gpt: frozen > > "frozen" means that xen-access receives VMI events, bug the guest is frozen > until I decide to stop xen-access. > I'm wondering what kind of exec events it received because they are not the > same, so it's not looping > over the same RIP over and over. (?) I think you're simply tripping some OS timer because you're slowing the guest down in the crash case, and simply keep the guest too busy handling events in the "freeze" case. Remember that there's quite a delay running each offending instruction: one vm_event saying you've got a violation, a reply saying "put this VCPU in single-step mode _and_ switch to the unrestricted EPT view", another vm_event saying "instruction executed", followed by anoher reply saying "switch back to the restricted EPT _and_ take the VCPU out of single-step mode". Restricting the whole of the guest's memory (and so doing this dance for _every_ instruction causing a fault) is practically guaranteed to upset the OS. A little EPT restricting goes a long way. Of course, if this could be improved so that even stress-tests (which is basically what xen-access is) leave the guest running smoothly, that'd be fantastic. Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/13/19 5:15 PM, Razvan Cojocaru wrote: On 5/13/19 5:06 PM, Jan Beulich wrote: On 13.05.19 at 15:58, wrote: On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? You're kidding? This is already in 4.12.0, and if it weren't I'm sure you're aware there are about 40 more AVX512 patches pending review. Right, I did indeed forget about the pending review part, for some reason I was sure they made it in. I've double-checked and we really are using 4.13-unstable - but we've also made changes to the emulator, working on the send-vm-events-from-the-emulator patch, so we'll revert to a pristine staging and retry, there's a chance this happens because of our changes. We'll find out what's going on exactly. I promised I'd return with more details. After some debugging, it certainly looks like the emulator returns UNIMPLEMENTED (5): Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d Looking at the source code, the emulator does appear to support vpmaddwd, however only for EVEX: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging#l6696 whereas our fail case uses VEX. This may be in the works in the aforementioned series, but is legitimately unsupported in 4.13 staging. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/14/19 5:16 PM, Jan Beulich wrote: On 14.05.19 at 15:47, wrote: Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d Looking at the source code, the emulator does appear to support vpmaddwd, however only for EVEX: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x 86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging #l6696 whereas our fail case uses VEX. This may be in the works in the aforementioned series, but is legitimately unsupported in 4.13 staging. Hmm, interesting. The origin of the encoding is at MMX times, which means it's more than just VPMADDWD that's missing, and it's been an omission back in the MMX/SSE2 series then. That's a genuine oversight, and in the light of this I'd like to apologize for my unfriendly initial reaction. I'll see about getting this fixed. (It would have helped if you had shared the encoding right away, since the mnemonic and operands are now often insufficient.) No problem at all. Indeed, sharing the encoding would have cleared things up faster. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM
All its callers live inside #ifdef CONFIG_HVM sections. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/mm/p2m.c | 72 +++ xen/include/asm-x86/p2m.h | 2 ++ 2 files changed, 38 insertions(+), 36 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index cc6661e..57c5eed 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -478,42 +478,6 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m) mm_write_unlock(&p2m->lock); } -int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, - p2m_type_t *t, p2m_access_t *a, - bool prepopulate) -{ -*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); - -/* Check host p2m if no valid entry in alternate */ -if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) -{ -struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); -unsigned int page_order; -int rc; - -*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, - P2M_ALLOC | P2M_UNSHARE, &page_order, 0); - -rc = -ESRCH; -if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) -return rc; - -/* If this is a superpage, copy that first */ -if ( prepopulate && page_order != PAGE_ORDER_4K ) -{ -unsigned long mask = ~((1UL << page_order) - 1); -gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); -mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); - -rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); -if ( rc ) -return rc; -} -} - -return 0; -} - mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order, bool_t locked) @@ -2378,6 +2342,42 @@ int unmap_mmio_regions(struct domain *d, #ifdef CONFIG_HVM +int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, + p2m_type_t *t, p2m_access_t *a, + bool prepopulate) +{ +*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); + +/* Check host p2m if no valid entry in alternate */ +if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) +{ +struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); +unsigned int page_order; +int rc; + +*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, + P2M_ALLOC | P2M_UNSHARE, &page_order, 0); + +rc = -ESRCH; +if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) +return rc; + +/* If this is a superpage, copy that first */ +if ( prepopulate && page_order != PAGE_ORDER_4K ) +{ +unsigned long mask = ~((1UL << page_order) - 1); +gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); +mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); + +rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); +if ( rc ) +return rc; +} +} + +return 0; +} + void p2m_altp2m_check(struct vcpu *v, uint16_t idx) { if ( altp2m_active(v->domain) ) diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 2d0bda1..7e71bf0 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -514,6 +514,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) return mfn_x(mfn); } +#ifdef CONFIG_HVM #define AP2MGET_prepopulate true #define AP2MGET_query false @@ -525,6 +526,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, p2m_type_t *t, p2m_access_t *a, bool prepopulate); +#endif /* Deadlock-avoidance scheme when calling get_gfn on different gfn's */ struct two_gfns { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6] x86/altp2m: Aggregate get entry and populate into common funcs
On 5/14/19 6:42 PM, Jan Beulich wrote: On 23.04.19 at 16:30, wrote: --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -478,6 +478,43 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m) mm_write_unlock(&p2m->lock); } +int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, + p2m_type_t *t, p2m_access_t *a, + bool prepopulate) +{ +*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); + +/* Check host p2m if no valid entry in alternate */ +if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) +{ +struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); +unsigned int page_order; +int rc; + +*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, + P2M_ALLOC | P2M_UNSHARE, &page_order, 0); + +rc = -ESRCH; +if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) +return rc; + +/* If this is a superpage, copy that first */ +if ( prepopulate && page_order != PAGE_ORDER_4K ) +{ +unsigned long mask = ~((1UL << page_order) - 1); +gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); +mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); + +rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); +if ( rc ) +return rc; +} +} + +return 0; +} + + mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order, bool_t locked) May I ask how the placement of this function was chosen? It looks as if all its callers live inside #ifdef CONFIG_HVM sections, just this function doesn't (reportedly causing build issues together with later changes). I believe it's just an oversight. I've sent out a patch that hopefully fixes this in a satisfactory manner. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/vm_event: fix rc check for uninitialized ring
On 5/24/19 4:15 PM, Juergen Gross wrote: vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event lists on domain creation"), but the callers test for -ENOSYS. Correct the callers. Signed-off-by: Juergen Gross --- xen/arch/x86/mm/p2m.c | 2 +- xen/common/monitor.c | 2 +- xen/common/vm_event.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 57c5eeda91..8a9a1153a8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1705,7 +1705,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn_l) /* We're paging. There should be a ring */ int rc = vm_event_claim_slot(d, d->vm_event_paging); -if ( rc == -ENOSYS ) +if ( rc == -EOPNOTSUPP ) { gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " "in place\n", d->domain_id, gfn_l); diff --git a/xen/common/monitor.c b/xen/common/monitor.c index cb5f37fdb2..d5c9ff1cbf 100644 --- a/xen/common/monitor.c +++ b/xen/common/monitor.c @@ -98,7 +98,7 @@ int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req) { case 0: break; -case -ENOSYS: +case -EOPNOTSUPP: /* * If there was no ring to handle the event, then * simply continue executing normally. diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 6e68be47bc..7b4ebced16 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -513,7 +513,7 @@ bool vm_event_check_ring(struct vm_event_domain *ved) * this function will always return 0 for a guest. For a non-guest, we check * for space and return -EBUSY if the ring is not available. * - * Return codes: -ENOSYS: the ring is not yet configured + * Return codes: -EOPNOTSUPP: the ring is not yet configured * -EBUSY: the ring is busy * 0: a spot has been reserved * But vm_event_grab_slot() (which ends up being what vm_event_wait_slot() returns), still returns -ENOSYS: 463 static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign) 464 { 465 unsigned int avail_req; 466 467 if ( !ved->ring_page ) 468 return -ENOSYS; Although we can't get to that part if vm_event_check_ring() returns false, we should probably return -EOPNOTSUPP from there as well. In fact, I wonder if any of the -ENOSYS returns in vm_event.c should not be replaced with return -EOPNOTSUPPs. Anyway, the change does clearly improve the code even without settling the above questions, so: Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/vm_event: fix rc check for uninitialized ring
On 5/24/19 6:23 PM, Juergen Gross wrote: > vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring > since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event > lists on domain creation"), but the callers test for -ENOSYS. > > Correct the callers. > > Signed-off-by: Juergen Gross > --- > V2: add blank line (Jan Beulich) > replace two further ENOSYS returns with EOPNOTSUPP (Razvan Cojocaru) Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/5] xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()
On 6/3/19 3:25 PM, Andrew Cooper wrote: This parameter isn't used at all. Futhermore, elide the copyback in failing cases, as it is only successful paths which generate data which needs sending back to the caller. Finally, drop a redundant d == NULL check, as that logic is all common at the begining of do_domctl(). No functional change. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/5] xen/vm-event: Expand vm_event_* spinlock macros and rename the lock
On 6/3/19 3:25 PM, Andrew Cooper wrote: These serve no purpose, but to add to the congnitive load of following the code. Remove the level of indirection. Furthermore, the lock protects all data in vm_event_domain, making ring_lock a poor choice of name. For vm_event_get_response() and vm_event_grab_slot(), fold the exit paths to have a single unlock, as the compiler can't make this optimisation itself. No functional change. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/5] xen/vm-event: Misc fixups
On 6/3/19 3:25 PM, Andrew Cooper wrote: * Drop redundant brackes, and inline qualifiers. * Insert newlines and spaces where appropriate. * Drop redundant NDEBUG - gdprint() is already conditional. Fix the logging level, as gdprintk() already prefixes the guest marker. No functional change. Signed-off-by: Andrew Cooper --- CC: Razvan Cojocaru CC: Tamas K Lengyel CC: Petre Pircalabu --- xen/common/vm_event.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 72f42b4..e872680 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -102,6 +102,7 @@ static int vm_event_enable( static unsigned int vm_event_ring_available(struct vm_event_domain *ved) { int avail_req = RING_FREE_REQUESTS(&ved->front_ring); + avail_req -= ved->target_producers; avail_req -= ved->foreign_producers; @@ -168,7 +169,7 @@ static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved) */ void vm_event_wake(struct domain *d, struct vm_event_domain *ved) { -if (!list_empty(&ved->wq.list)) +if ( !list_empty(&ved->wq.list) ) vm_event_wake_queued(d, ved); else vm_event_wake_blocked(d, ved); @@ -216,8 +217,8 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved) return 0; } -static inline void vm_event_release_slot(struct domain *d, - struct vm_event_domain *ved) +static void vm_event_release_slot(struct domain *d, + struct vm_event_domain *ved) But inline is still asking the compiler to try and generate code that doesn't end up CALLing an actual function, so is it really redundant here? I do realize that for most cases the compiler will have its way with this code anyway - especially since the function is static - but "static" is not guaranteed to also mean "inline", is it? In any case, Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/5] xen/vm-event: Remove unnecessary vm_event_domain indirection
On 6/3/19 3:25 PM, Andrew Cooper wrote: The use of (*ved)-> leads to poor code generation, as the compiler can't assume the pointer hasn't changed, and results in hard-to-follow code. For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and work primarily with a local ved pointer. This has a key advantage in vm_event_enable(), in that the partially constructed vm_event_domain only becomes globally visible once it is fully constructed. As a consequence, the spinlock doesn't need holding. Furthermore, rearrange the order of operations to be more sensible. Check for repeated enables and an bad HVM_PARAM before allocating memory, and gather the trivial setup into one place, dropping the redundant zeroing. No practical change that callers will notice. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] xen/vm-event: Fix interactions with the vcpu list
On 6/3/19 3:25 PM, Andrew Cooper wrote: vm_event_resume() should use domain_vcpu(), rather than opencoding it without its Spectre v1 safety. vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is NULL, so drop the outer if() and reindent, fixing up style issues. The comment, which is left alone, is false. This algorithm still has starvation issues when there is an asymetric rate of generated events. However, the existing logic is sufficiently complicated and fragile that I don't think I've followed it fully, and because we're trying to obsolete this interface, the safest course of action is to leave it alone, rather than to end up making things subtly different. Therefore, no practical change that callers would notice. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 4/4] x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled
On 6/3/19 7:38 PM, Tamas K Lengyel wrote: On Mon, Jun 3, 2019 at 2:26 AM Jan Beulich wrote: On 16.05.19 at 23:37, wrote: Disable it by default as it is only an experimental subsystem. Signed-off-by: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne Cc: George Dunlap Cc: Ian Jackson Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: George Dunlap v4: add ASSERT_UNREACHABLE to inlined functions where applicable & other fixups --- xen/arch/x86/Kconfig | 6 +- xen/arch/x86/domain.c | 2 ++ xen/arch/x86/domctl.c | 2 ++ xen/arch/x86/mm/Makefile | 2 +- xen/arch/x86/x86_64/compat/mm.c | 2 ++ xen/arch/x86/x86_64/mm.c | 2 ++ xen/common/Kconfig| 3 --- xen/common/domain.c | 2 +- xen/common/grant_table.c | 2 +- xen/common/memory.c | 2 +- xen/common/vm_event.c | 6 +++--- xen/include/asm-x86/mem_sharing.h | 28 xen/include/asm-x86/mm.h | 3 +++ xen/include/xen/sched.h | 2 +- xen/include/xsm/dummy.h | 2 +- xen/include/xsm/xsm.h | 4 ++-- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 4 ++-- 18 files changed, 58 insertions(+), 18 deletions(-) Daniel, it looks like you weren't Cc-ed here, but your ack is needed. Indeed, I've also seem to have missed CC-ing Razvan (fixed now). Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership
Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6fbdc2b..d60e3a5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -405,7 +405,8 @@ L: xen-devel@lists.xenproject.org F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru +M: Alexandru Isaila +M: Petre Pircalabu M: Tamas K Lengyel S: Supported F: tools/tests/xen-access -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership
On 6/12/19 7:02 PM, Razvan Cojocaru wrote: Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6fbdc2b..d60e3a5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -405,7 +405,8 @@ L: xen-devel@lists.xenproject.org F:unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru +M: Alexandru Isaila +M: Petre Pircalabu M:Tamas K Lengyel S:Supported F:tools/tests/xen-access I'm not sure why get-maintainers.pl did not add Tamas' email address (added now). I'll still be in Bitdefender, subscribed to xen-devel and following the project, but I'll be quite a bit more involved in other projects and that makes being a maintainer difficult. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/altp2m: add altp2m_vcpu_disable_notify
On 1/2/19 1:50 PM, Wei Liu wrote: > On Tue, Dec 18, 2018 at 05:11:44PM +0200, Razvan Cojocaru wrote: >> Allow altp2m users to disable #VE/VMFUNC alone. Currently it is >> only possible to disable this functionality when we disable altp2m >> completely; #VE/VMFUNC can only be enabled once per altp2m session. >> >> In addition to making things complete, disabling #VE is also a >> workaround for CFW116 ("When Virtualization Exceptions are Enabled, >> EPT Violations May Generate Erroneous Virtualization Exceptions") >> on Xeon E-2100 CPUs. >> >> Signed-off-by: Razvan Cojocaru >> >> --- >> Changes since V2: >> - Fixed compilation by completing the removal of all references >>to "pad". >> >> Changes since V1: >> - Updated the patch description to specify E-2100. >> - Made trying to disable #VE when it's already disabled a no-op. >> - Removed leftover uint32_t pad; from struct >>xen_hvm_altp2m_vcpu_disable_notify. >> --- >> tools/libxc/include/xenctrl.h | 2 ++ >> tools/libxc/xc_altp2m.c | 22 ++ >> xen/arch/x86/hvm/hvm.c | 28 >> xen/include/public/hvm/hvm_op.h | 11 ++- >> 4 files changed, 62 insertions(+), 1 deletion(-) >> >> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h >> index 97ae965..31cdda7 100644 >> --- a/tools/libxc/include/xenctrl.h >> +++ b/tools/libxc/include/xenctrl.h >> @@ -1932,6 +1932,8 @@ int xc_altp2m_get_domain_state(xc_interface *handle, >> uint32_t dom, bool *state); >> int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t dom, bool >> state); >> int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, uint32_t domid, >> uint32_t vcpuid, xen_pfn_t gfn); >> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid, >> + uint32_t vcpuid); >> int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, >>xenmem_access_t default_access, uint16_t >> *view_id); >> int xc_altp2m_destroy_view(xc_interface *handle, uint32_t domid, >> diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c >> index 844b9f1..f8cd603 100644 >> --- a/tools/libxc/xc_altp2m.c >> +++ b/tools/libxc/xc_altp2m.c >> @@ -91,6 +91,28 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface >> *handle, uint32_t domid, >> return rc; >> } >> >> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid, >> + uint32_t vcpuid) >> +{ >> +int rc; >> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg); >> + >> +arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg)); >> +if ( arg == NULL ) >> +return -1; >> + >> +arg->version = HVMOP_ALTP2M_INTERFACE_VERSION; >> +arg->cmd = HVMOP_altp2m_vcpu_disable_notify; >> +arg->domain = domid; >> +arg->u.disable_notify.vcpu_id = vcpuid; >> + >> +rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, >> + HYPERCALL_BUFFER_AS_ARG(arg)); > > Tabs here. Right, that was copy/pasted from xc_altp2m_set_vcpu_enable_notify() - it turns out that most function in that source file have the tab problem. I'll fix them all while at it. > With this fixed: > > Acked-by: Wei Liu Thanks! Happy new year, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/altp2m: add altp2m_vcpu_disable_notify
On 1/2/19 2:29 PM, Andrew Cooper wrote: > On 02/01/2019 11:50, Wei Liu wrote: >> On Tue, Dec 18, 2018 at 05:11:44PM +0200, Razvan Cojocaru wrote: >>> Allow altp2m users to disable #VE/VMFUNC alone. Currently it is >>> only possible to disable this functionality when we disable altp2m >>> completely; #VE/VMFUNC can only be enabled once per altp2m session. >>> >>> In addition to making things complete, disabling #VE is also a >>> workaround for CFW116 ("When Virtualization Exceptions are Enabled, >>> EPT Violations May Generate Erroneous Virtualization Exceptions") >>> on Xeon E-2100 CPUs. >>> >>> Signed-off-by: Razvan Cojocaru >>> >>> --- >>> Changes since V2: >>> - Fixed compilation by completing the removal of all references >>>to "pad". >>> >>> Changes since V1: >>> - Updated the patch description to specify E-2100. >>> - Made trying to disable #VE when it's already disabled a no-op. >>> - Removed leftover uint32_t pad; from struct >>>xen_hvm_altp2m_vcpu_disable_notify. >>> --- >>> tools/libxc/include/xenctrl.h | 2 ++ >>> tools/libxc/xc_altp2m.c | 22 ++ >>> xen/arch/x86/hvm/hvm.c | 28 >>> xen/include/public/hvm/hvm_op.h | 11 ++- >>> 4 files changed, 62 insertions(+), 1 deletion(-) >>> >>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h >>> index 97ae965..31cdda7 100644 >>> --- a/tools/libxc/include/xenctrl.h >>> +++ b/tools/libxc/include/xenctrl.h >>> @@ -1932,6 +1932,8 @@ int xc_altp2m_get_domain_state(xc_interface *handle, >>> uint32_t dom, bool *state); >>> int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t dom, bool >>> state); >>> int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, uint32_t domid, >>> uint32_t vcpuid, xen_pfn_t gfn); >>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid, >>> + uint32_t vcpuid); >>> int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, >>>xenmem_access_t default_access, uint16_t >>> *view_id); >>> int xc_altp2m_destroy_view(xc_interface *handle, uint32_t domid, >>> diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c >>> index 844b9f1..f8cd603 100644 >>> --- a/tools/libxc/xc_altp2m.c >>> +++ b/tools/libxc/xc_altp2m.c >>> @@ -91,6 +91,28 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface >>> *handle, uint32_t domid, >>> return rc; >>> } >>> >>> +int xc_altp2m_set_vcpu_disable_notify(xc_interface *handle, uint32_t domid, >>> + uint32_t vcpuid) >>> +{ >>> +int rc; >>> +DECLARE_HYPERCALL_BUFFER(xen_hvm_altp2m_op_t, arg); >>> + >>> +arg = xc_hypercall_buffer_alloc(handle, arg, sizeof(*arg)); >>> +if ( arg == NULL ) >>> +return -1; >>> + >>> +arg->version = HVMOP_ALTP2M_INTERFACE_VERSION; >>> +arg->cmd = HVMOP_altp2m_vcpu_disable_notify; >>> +arg->domain = domid; >>> +arg->u.disable_notify.vcpu_id = vcpuid; >>> + >>> +rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, >>> + HYPERCALL_BUFFER_AS_ARG(arg)); >> Tabs here. >> >> With this fixed: >> >> Acked-by: Wei Liu > > Fixed and committed. Oh, thanks! Nevermind my previous reply then. :) Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] libxc/altp2m: clean up TABs
Replace all the TABs with spaces. Signed-off-by: Razvan Cojocaru --- tools/libxc/xc_altp2m.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/libxc/xc_altp2m.c b/tools/libxc/xc_altp2m.c index e61bacf..a86520c 100644 --- a/tools/libxc/xc_altp2m.c +++ b/tools/libxc/xc_altp2m.c @@ -38,7 +38,7 @@ int xc_altp2m_get_domain_state(xc_interface *handle, uint32_t dom, bool *state) arg->domain = dom; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); if ( !rc ) *state = arg->u.domain_state.state; @@ -62,7 +62,7 @@ int xc_altp2m_set_domain_state(xc_interface *handle, uint32_t dom, bool state) arg->u.domain_state.state = state; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; @@ -85,7 +85,7 @@ int xc_altp2m_set_vcpu_enable_notify(xc_interface *handle, uint32_t domid, arg->u.enable_notify.gfn = gfn; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; @@ -130,7 +130,7 @@ int xc_altp2m_create_view(xc_interface *handle, uint32_t domid, arg->u.view.hvmmem_default_access = default_access; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); if ( !rc ) *view_id = arg->u.view.view; @@ -155,7 +155,7 @@ int xc_altp2m_destroy_view(xc_interface *handle, uint32_t domid, arg->u.view.view = view_id; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; @@ -178,7 +178,7 @@ int xc_altp2m_switch_to_view(xc_interface *handle, uint32_t domid, arg->u.view.view = view_id; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; @@ -253,7 +253,7 @@ int xc_altp2m_set_mem_access(xc_interface *handle, uint32_t domid, arg->u.mem_access.gfn = gfn; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; @@ -278,7 +278,7 @@ int xc_altp2m_change_gfn(xc_interface *handle, uint32_t domid, arg->u.change_gfn.new_gfn = new_gfn; rc = xencall2(handle->xcall, __HYPERVISOR_hvm_op, HVMOP_altp2m, - HYPERCALL_BUFFER_AS_ARG(arg)); + HYPERCALL_BUFFER_AS_ARG(arg)); xc_hypercall_buffer_free(handle, arg); return rc; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [xen-unstable test] 131688: regressions - FAIL
On 1/3/19 8:50 PM, osstest service owner wrote: > flight 131688 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/131688/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail > REGR. vs. 131670 > > Tests which did not succeed, but are not blocking: > test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like > 131670 > test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like > 131670 > test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like > 131670 > test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like > 131670 > test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like > 131670 > test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like > 131670 > test-armhf-armhf-libvirt 14 saverestore-support-checkfail like > 131670 > test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like > 131670 > test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like > 131670 > test-amd64-i386-xl-pvshim12 guest-start fail never > pass > test-amd64-i386-libvirt 13 migrate-support-checkfail never > pass > test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never > pass > test-amd64-amd64-libvirt 13 migrate-support-checkfail never > pass > test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never > pass > test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check > fail never pass > test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check > fail never pass > test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never > pass > test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never > pass > test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never > pass > test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never > pass > test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never > pass > test-armhf-armhf-xl 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl 14 saverestore-support-checkfail never > pass > test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never > pass > test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never > pass > test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never > pass > test-armhf-armhf-libvirt 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never > pass > test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never > pass > test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never > pass > test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never > pass > test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never > pass > test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never > pass > test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never > pass > test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never > pass > test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never > pass > > version targeted for testing: > xen 9b97818c3d5854caa95d8af20fe7ef6112ff1b06 > baseline version: > xen 7b6e05c50fc39466fcc685fb6d4216f99af58743 > > Last test of basis 131670 2019-01-01 08:51:55 Z2 days > Testing same since 131688 2019-01-02 14:52:32 Z1 days1 attempts > > > People who touched revisions under test: > Razvan Cojocaru > Wei Liu > > jobs: > build-amd64-xsm pass > build-i386-xsm pass > build-amd64-xtf pass > build-amd64
Re: [Xen-devel] Xen (both x86 and Arm) Community Call: Jan 9 - 16:00 - 17:00 UTC - Call for Agenda Items
On 1/7/19 4:26 PM, Tamas K Lengyel wrote: Fix VGA logdirty with altp2m (v11) Razvan Cojocaru? This has been merged already. Indeed, sorry for only seeing this now - I've had a small technical SNAFU I had to take care of today and missed some emails. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version
On 1/8/19 6:27 PM, Jan Beulich wrote: On 19.12.18 at 19:52, wrote: Signed-off-by: Petre Pircalabu An empty description is not helpful. The immediate question is: Why? We don't do this for other interface versions. I'm unconvinced a special purpose piece of information like this one belongs into the rather generic version hypercall. For an introspection application meant to be deployed on several Xen versions without recompiling, it is important to be able to decide at runtime what size and layout the vm_event struct has. Currently this can somewhat be done by associating the current version with the vm_event version, but that is not ideal for obvious reasons. Reading the vm_event version from an actual vm_event is also out of the question, because in order to be able to receive the first vm_event we have to set the ring buffer up, and that requires knowledge of the size of the vm_event. So a run-time mechanism for querying the vm_event version is needed. We just thought that this was the most flexible place to add it. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version
On 1/8/19 6:47 PM, Jan Beulich wrote: On 08.01.19 at 17:37, wrote: On 1/8/19 6:27 PM, Jan Beulich wrote: On 19.12.18 at 19:52, wrote: Signed-off-by: Petre Pircalabu An empty description is not helpful. The immediate question is: Why? We don't do this for other interface versions. I'm unconvinced a special purpose piece of information like this one belongs into the rather generic version hypercall. For an introspection application meant to be deployed on several Xen versions without recompiling, it is important to be able to decide at runtime what size and layout the vm_event struct has. Currently this can somewhat be done by associating the current version with the vm_event version, but that is not ideal for obvious reasons. Reading the vm_event version from an actual vm_event is also out of the question, because in order to be able to receive the first vm_event we have to set the ring buffer up, and that requires knowledge of the size of the vm_event. So a run-time mechanism for querying the vm_event version is needed. We just thought that this was the most flexible place to add it. How about a new XEN_DOMCTL_VM_EVENT_GET_VERSION? That would work as well, we just thought this was the least intrusive and most extensible way to do it (other queries could be added similarly in the future, without needing a new DOMCTL / libxc toolstack modifications). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12] x86/p2m: fix p2m_finish_type_change()
finish_type_change() returns a negative int on error, but the current code checks if ( !rc ). Also properly indent the out: label while at it. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/mm/p2m.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 5451f16..e08f6b1 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1176,7 +1176,7 @@ int p2m_finish_type_change(struct domain *d, rc = finish_type_change(hostp2m, first_gfn, max_nr); -if ( !rc ) +if ( rc < 0 ) goto out; #ifdef CONFIG_HVM @@ -1193,13 +1193,13 @@ int p2m_finish_type_change(struct domain *d, rc = finish_type_change(altp2m, first_gfn, max_nr); p2m_unlock(altp2m); -if ( !rc ) +if ( rc < 0 ) goto out; } } #endif -out: + out: p2m_unlock(hostp2m); return rc; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V2] x86/p2m: fix p2m_finish_type_change()
finish_type_change() returns a negative int on error, but the current code checks if ( !rc ). We also need to treat finish_type_change()'s return codes cumulatively in the success case (don't overwrite a 1 returned while processing the hostp2m if processing an altp2m returns 0). The breakage was introduced by commit 0fb4b58c8b ("x86/altp2m: fix display frozen when switching to a new view early"). Properly indent the out: label while at it. Signed-off-by: Razvan Cojocaru --- Changes since V1: - Updated description. - Now treating finish_type_change()'s return value cumulatively for the success case. --- xen/arch/x86/mm/p2m.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 5451f16..91f412f 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1176,7 +1176,7 @@ int p2m_finish_type_change(struct domain *d, rc = finish_type_change(hostp2m, first_gfn, max_nr); -if ( !rc ) +if ( rc < 0 ) goto out; #ifdef CONFIG_HVM @@ -1188,18 +1188,24 @@ int p2m_finish_type_change(struct domain *d, if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) { struct p2m_domain *altp2m = d->arch.altp2m_p2m[i]; +int rc1; p2m_lock(altp2m); -rc = finish_type_change(altp2m, first_gfn, max_nr); +rc1 = finish_type_change(altp2m, first_gfn, max_nr); p2m_unlock(altp2m); -if ( !rc ) +if ( rc1 < 0 ) +{ +rc = rc1; goto out; +} + +rc = max(rc, rc1); } } #endif -out: + out: p2m_unlock(hostp2m); return rc; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
On 12/20/18 4:28 PM, Paul Durrant wrote: -Original Message- From: Petre Ovidiu PIRCALABU [mailto:ppircal...@bitdefender.com] Sent: 20 December 2018 14:26 To: Paul Durrant ; xen-devel@lists.xenproject.org Cc: Stefano Stabellini ; Wei Liu ; Razvan Cojocaru ; Konrad Rzeszutek Wilk ; George Dunlap ; Andrew Cooper ; Ian Jackson ; Tim (Xen.org) ; Julien Grall ; Tamas K Lengyel ; Jan Beulich ; Roger Pau Monne Subject: Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests. On Thu, 2018-12-20 at 12:05 +, Paul Durrant wrote: The memory for the asynchronous ring and the synchronous channels will be allocated from domheap and mapped to the controlling domain using the foreignmemory_map_resource interface. Unlike the current implementation, the allocated pages are not part of the target DomU, so they will not be reclaimed when the vm_event domain is disabled. Why re-invent the wheel here? The ioreq infrastructure already does pretty much everything you need AFAICT. Paul I wanted preseve as much as possible from the existing vm_event DOMCTL interface and add only the necessary code to allocate and map the vm_event_pages. That means we have two subsystems duplicating a lot of functionality though. It would be much better to use ioreq server if possible than provide a compatibility interface via DOMCTL. Just to clarify the compatibility issue: there's a third element between Xen and the introspection application, the Linux kernel which needs to be fairly recent for the whole ioreq machinery to work. The quemu code also seems to fallback to the old way of working if that's the case. This means that there's a choice to be made here: either we keep backwards compatibility with the old vm_event interface (in which case we can't drop the waitqueue code), or we switch to the new one and leave older setups in the dust (but there's less code duplication and we can get rid of the waitqueue code). In any event, it's not very clear (to me, at least) how the envisioned ioreq replacement should work. I assume we're meant to use the whole infrastructure (as opposed to what we're now doing, which is to only use the map-hypervisor-memory part), i.e. both mapping and signaling. Could we discuss this in more detail? Are there any docs on this or ioreq minimal clients (like xen-access.c is for vm_event) we might use? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On 1/9/19 6:15 PM, Tamas K Lengyel wrote: On Mon, Jan 7, 2019 at 2:01 AM Roger Pau Monné wrote: Adding the introspection guys. On Fri, Jan 04, 2019 at 08:47:04AM -0700, Jan Beulich wrote: On 04.01.19 at 16:35, wrote: On Fri, Jan 04, 2019 at 06:22:19AM -0700, Jan Beulich wrote: On 04.01.19 at 09:57, wrote: On Fri, Dec 21, 2018 at 03:05:03PM -0800, Christopher Clark wrote: On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné wrote: On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote: On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné wrote: On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote: Then I wonder why you need such check in any case if the code can handle such cases, the more than the check itself is racy. OK, so at the root of the question here is: does it matter what the p2m type of the memory is at these points: 1) when the gfn is translated to mfn, at the time of ring registration This is the important check, because that's where you should take a reference to the page. In this case you should check that the page is of ram_rw type. 2) when the hypervisor writes into guest memory: - where the tx_ptr index is initialized in the register op - where ringbuf data is written in sendv - where ring description data is written in notify As long as you keep a reference to the pages that are part of the ring you don't need to do any checks when writing/reading from them. If the guest messes up it's p2m and does change the gfn -> mfn mappings for pages that are part of the ring that's the guest problem, the hypervisor still has a reference to those pages so they won't be reused. For use cases like introspection this may not be fully correct, but it may also be that my understanding there isn't fully correct. If introspection agents care about _any_ writes to a page, hypervisor ones (which in most cases are merely writes on behalf of the guest) might matter as well. I think to decide whether page accesses need to be accompanied by any checks (and if so, which ones) one needs to - establish what p2m type transitions are possible for a given page, - verify what restrictions may occur "behind the back" of the entity wanting to do the accesses, - explore whether doing the extra checking at p2m type change time wouldn't be better than at the time of access. Maybe this is use-case is different, but how does introspection handle accesses to the shared info page or the runstate info for example? I would consider argo to be the same in this regard. Not exactly: The shared info page is special in any event. For runstate info (and alike - there's also struct vcpu_time_info) I'd question correctness of the current handling. If that's wrong already, I'd prefer if the issue wasn't spread. There are also grants, which when used together with another guest on the same host could allow to bypass introspection AFAICT? (unless there's some policy applied that limit grant sharing to trusted domains) TBH I'm not sure how to handle hypoervisor accesses with introspection. My knowledge of introspection is fairly limited, but it pauses the guest and sends a notification to an in guest agent. I'm not sure this is applicable to hypervisor writes, since it's not possible to pause hypervisor execution and wait for a response from a guest agent. Introspection applications only care about memory accesses performed by the guest. Hypervisor accesses to monitored pages are not included when monitoring - it is actually a feature when using the emulator in Xen to continue guest execution because the hypervisor ignores EPT memory permissions that trip the guest for introspection. So having the hypervisor access memory or a grant-shared page being accessed in another domain are not a problem for introspection. Indeed, that's how it goes. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On 1/9/19 6:34 PM, Roger Pau Monné wrote: Maybe this is use-case is different, but how does introspection handle accesses to the shared info page or the runstate info for example? I would consider argo to be the same in this regard. Not exactly: The shared info page is special in any event. For runstate info (and alike - there's also struct vcpu_time_info) I'd question correctness of the current handling. If that's wrong already, I'd prefer if the issue wasn't spread. There are also grants, which when used together with another guest on the same host could allow to bypass introspection AFAICT? (unless there's some policy applied that limit grant sharing to trusted domains) TBH I'm not sure how to handle hypoervisor accesses with introspection. My knowledge of introspection is fairly limited, but it pauses the guest and sends a notification to an in guest agent. I'm not sure this is applicable to hypervisor writes, since it's not possible to pause hypervisor execution and wait for a response from a guest agent. Introspection applications only care about memory accesses performed by the guest. Hypervisor accesses to monitored pages are not included when monitoring - it is actually a feature when using the emulator in Xen to continue guest execution because the hypervisor ignores EPT memory permissions that trip the guest for introspection. So having the hypervisor access memory or a grant-shared page being accessed in another domain are not a problem for introspection. Can't then two guests running on the same host be able to completely bypass introspection? I guess you prevent this by limiting to which guests pages can be shared? Would these two guests be HVM guests? Introspection only works for HVM guests. I'm not sure I follow your scenario though. How would these guests collaborate to escape introspection via grants? If that's the case, and introspection doesn't care about hypervisor accesses to guest pages, then just getting a reference to the underlying page when the ring is setup should be enough. There's no need to check the gfn -> mfn relation every time there's an hypervisor access to the ring. I think so, but I might be missing something. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On 1/9/19 6:50 PM, Tamas K Lengyel wrote: On Wed, Jan 9, 2019 at 9:48 AM Razvan Cojocaru wrote: On 1/9/19 6:34 PM, Roger Pau Monné wrote: Maybe this is use-case is different, but how does introspection handle accesses to the shared info page or the runstate info for example? I would consider argo to be the same in this regard. Not exactly: The shared info page is special in any event. For runstate info (and alike - there's also struct vcpu_time_info) I'd question correctness of the current handling. If that's wrong already, I'd prefer if the issue wasn't spread. There are also grants, which when used together with another guest on the same host could allow to bypass introspection AFAICT? (unless there's some policy applied that limit grant sharing to trusted domains) TBH I'm not sure how to handle hypoervisor accesses with introspection. My knowledge of introspection is fairly limited, but it pauses the guest and sends a notification to an in guest agent. I'm not sure this is applicable to hypervisor writes, since it's not possible to pause hypervisor execution and wait for a response from a guest agent. Introspection applications only care about memory accesses performed by the guest. Hypervisor accesses to monitored pages are not included when monitoring - it is actually a feature when using the emulator in Xen to continue guest execution because the hypervisor ignores EPT memory permissions that trip the guest for introspection. So having the hypervisor access memory or a grant-shared page being accessed in another domain are not a problem for introspection. Can't then two guests running on the same host be able to completely bypass introspection? I guess you prevent this by limiting to which guests pages can be shared? Would these two guests be HVM guests? Introspection only works for HVM guests. I'm not sure I follow your scenario though. How would these guests collaborate to escape introspection via grants? If there are two domains acting maliciously in concert to bypass monitoring of memory writes they could achieve that with grants, yes. Say a write-monitored page is grant-shared to another domain, which then does the write on behalf of the first. I wouldn't say that's "completely bypassing introspection" though, there are many types of events that can be monitored, write-accesses are only one. I'm not aware of any mechanism that can be used to limit which pages can be shared but you can use XSM to restrict which domains can share pages to begin with. That's normally enough. Right, I agree. We're not currently dealing with that case and assume that XSM (or a similar mechanism) will be used in scenarios where this level of access is possible. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 4/6] vm_event: Use slotted channels for sync requests.
On 1/10/19 11:58 AM, Paul Durrant wrote: -Original Message- Why re-invent the wheel here? The ioreq infrastructure already does pretty much everything you need AFAICT. Paul I wanted preseve as much as possible from the existing vm_event DOMCTL interface and add only the necessary code to allocate and map the vm_event_pages. That means we have two subsystems duplicating a lot of functionality though. It would be much better to use ioreq server if possible than provide a compatibility interface via DOMCTL. Just to clarify the compatibility issue: there's a third element between Xen and the introspection application, the Linux kernel which needs to be fairly recent for the whole ioreq machinery to work. The qemu code also seems to fallback to the old way of working if that's the case. Tht'a corrent. For IOREQ server there is a fall-back mechanism when privcmd doesn't support resource mapping. This means that there's a choice to be made here: either we keep backwards compatibility with the old vm_event interface (in which case we can't drop the waitqueue code), or we switch to the new one and leave older setups in the dust (but there's less code duplication and we can get rid of the waitqueue code). I don't know what your compatibility model is. QEMU needs to maintain compatibility across various different versions of Xen and Linux so there are many shims and much compat code. You may not need this. Our current model is: deploy a special guest (that we call a SVA - short for security virtual appliance), with its own kernel and applications, that for all intents and purposes will act dom0-like. So in that scenario, we control the guest kernel so backwards compatibility for the case where the kernel does not support the proper ioctl is not a priority. That said, it might very well be an issue for someone, and we'd like to be well-behaved citizens and not inconvenience other vm_event consumers. Tamas, is this something you'd be concerned about? What we do care about is being able to fallback in the case where the host hypervisor does not know anything about the new ioreq infrastructure. IOW, nobody can stop a client from running a Xen 4.7-based XenServer on top of which our introspection guest will not be able to use the new ioreq code even if it's using the latest kernel. But that can be done at application level and would not require hypervisor-level backwards compatibility support (whereas in the first case - an old kernel - it would). On top of all of this there's Andrew's concern of being able to get rid of the current vm_event waitqueue code that's making migration brittle. So, if I understand the situation correctly, we need to negotiate the following: 1. Should we try to switch to the ioreq infrastructure for vm_event or use our custom one? If I'm remembering things correctly, Paul and Jan are for it, Andrew is somewhat against it, Tamas has not expressed a preference. 2. However we approach the new code, should we or should we not also provide a backwards compatibility layer in the hypervisor? We don't need it, but somebody might and it's probably not a good idea to design based entirely on the needs of one use-case. Tamas may have different needs here, and maybe other members of the xen-devel community as well. Andrew prefers that we don't since that removes the waitqueue code. To reiterate how this got started: we want to move the ring buffer memory from the guest to the hypervisor (we've had cases of OSes reclaiming that page after the first introspection application exit), and we want to make that memory bigger (so that more events will fit into it, carrying more information (bigger events)). That's essentially all we're after. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 12/25/18 4:29 AM, Tian, Kevin wrote: From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] Sent: Friday, December 14, 2018 7:50 PM Block interrupts (in vmx_intr_assist()) for the duration of processing a sync vm_event (similarly to the strategy currently used for single-stepping). Otherwise, attempting to emulate an instruction when requested by a vm_event reply may legitimately need to call e.g. hvm_inject_page_fault(), which then overwrites the active interrupt in the VMCS. The sync vm_event handling path on x86/VMX is (roughly): monitor_traps() -> process vm_event -> vmx_intr_assist() (possibly writing VM_ENTRY_INTR_INFO) -> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() (possibly overwriting the VM_ENTRY_INTR_INFO value). This patch may also be helpful for the future removal of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). Signed-off-by: Razvan Cojocaru Reviewed-by: Kevin Tian Ping? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 2/2] x86/emulate: Send vm_event from emulate
On 1/11/19 6:38 PM, Tamas K Lengyel wrote: On Fri, Jan 11, 2019 at 8:37 AM Alexandru Stefan ISAILA wrote: This patch aims to have mem access vm events sent from the emulator. This is useful in the case of page-walks that have to emulate instructions in access denied pages. I'm a little confused about the scenario you mention here. You mark pages where the pagetables are non-readable/writable in EPT and you expect the emulated instruction would also violate access permissions of the guest pagetable itself? Hello Tamas, The scenario is this: the pagetables are read-only. At some point, a walk tries to write the accessed bit, or the dirty bit somewhere in that read-only memory, causing an EPT fault, so we end up in p2m_mem_access_check(). Understandably, we don't care about sending this event out to the introspection application (we could if we wanted to, which is why this behaviour is configurable, but I think it's safe to say that for most introspection use-cases this is something we don't care about, and hence a perfect opportunity for optimization). Now, emulating the current instruction helps, and it works. But, what if that instruction would have tried to write to another protected page? Emulating it, as things stand now, means that we will lose _that_ event, and that's potentially a very important EPT event. We've tried to attack this problem by only writing the A/D bits and almost came to a satisfactory solution but there's still some debate on whether it's architecturally correct or not - that approach needs more studying. The alternative we've come up with is to instead, at least for the time being, attempt to send out vm_events from the emulator code only in this case: where we want to emulate the page walk without consulting the EPT, but want to consult it when actually emulating the current instruction. I hope that made sense. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 2/2] x86/emulate: Send vm_event from emulate
On 1/11/19 10:51 PM, Tamas K Lengyel wrote: > On Fri, Jan 11, 2019 at 9:51 AM Razvan Cojocaru > wrote: >> >> On 1/11/19 6:38 PM, Tamas K Lengyel wrote: >>> On Fri, Jan 11, 2019 at 8:37 AM Alexandru Stefan ISAILA >>> wrote: >>>> >>>> This patch aims to have mem access vm events sent from the emulator. >>>> This is useful in the case of page-walks that have to emulate >>>> instructions in access denied pages. >>>> >>> >>> I'm a little confused about the scenario you mention here. You mark >>> pages where the pagetables are non-readable/writable in EPT and you >>> expect the emulated instruction would also violate access permissions >>> of the guest pagetable itself? >> >> Hello Tamas, >> >> The scenario is this: the pagetables are read-only. At some point, a >> walk tries to write the accessed bit, or the dirty bit somewhere in that >> read-only memory, causing an EPT fault, so we end up in >> p2m_mem_access_check(). >> >> Understandably, we don't care about sending this event out to the >> introspection application (we could if we wanted to, which is why this >> behaviour is configurable, but I think it's safe to say that for most >> introspection use-cases this is something we don't care about, and hence >> a perfect opportunity for optimization). >> >> Now, emulating the current instruction helps, and it works. But, what if >> that instruction would have tried to write to another protected page? >> Emulating it, as things stand now, means that we will lose _that_ event, >> and that's potentially a very important EPT event. >> >> We've tried to attack this problem by only writing the A/D bits and >> almost came to a satisfactory solution but there's still some debate on >> whether it's architecturally correct or not - that approach needs more >> studying. >> >> The alternative we've come up with is to instead, at least for the time >> being, attempt to send out vm_events from the emulator code only in this >> case: where we want to emulate the page walk without consulting the EPT, >> but want to consult it when actually emulating the current instruction. >> >> I hope that made sense. > > I'm still confused :) In the pagetable walking case, didn't the > instruction you are emulating just trip by writing to a page you want > to allow it writing to (the A/D bits)? Or are you saying there is an > unrelated trap happening with an execute-violation but you don't know > what other write-protected page it would have tripped if it actually > executed, and by emulating you effectively miss that write event? The > latter makes sense, it's the pagetable walking case I have a hard time > putting together. OK, assume we have instruction I that accesses page X, which access in turn causes the A/D bits to be set in pagetable page Y (because it triggers a page walk). We want to allow the write to Y (setting whatever A/D bits), but if X is write-protected and I tries to write into it, we want a vm_event for that write. With the current code, if we emulate I to write to Y, we lose a potential vm_event for the write to X. This doesn't happen often, but it does happen. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 1/12/19 12:04 AM, Boris Ostrovsky wrote: On 12/14/18 6:49 AM, Razvan Cojocaru wrote: Block interrupts (in vmx_intr_assist()) for the duration of processing a sync vm_event (similarly to the strategy currently used for single-stepping). Otherwise, attempting to emulate an instruction when requested by a vm_event reply may legitimately need to call e.g. hvm_inject_page_fault(), which then overwrites the active interrupt in the VMCS. The sync vm_event handling path on x86/VMX is (roughly): monitor_traps() -> process vm_event -> vmx_intr_assist() (possibly writing VM_ENTRY_INTR_INFO) -> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() (possibly overwriting the VM_ENTRY_INTR_INFO value). This patch may also be helpful for the future removal of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). Signed-off-by: Razvan Cojocaru Reviewed-by: Boris Ostrovsky Thanks! So now we have three reviewed-bys, if I'm not mistaken all we need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) acks (or otherwise). Could you please take a look? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 1/14/19 11:53 AM, Jan Beulich wrote: On 14.01.19 at 10:34, wrote: On 1/12/19 12:04 AM, Boris Ostrovsky wrote: On 12/14/18 6:49 AM, Razvan Cojocaru wrote: Block interrupts (in vmx_intr_assist()) for the duration of processing a sync vm_event (similarly to the strategy currently used for single-stepping). Otherwise, attempting to emulate an instruction when requested by a vm_event reply may legitimately need to call e.g. hvm_inject_page_fault(), which then overwrites the active interrupt in the VMCS. The sync vm_event handling path on x86/VMX is (roughly): monitor_traps() -> process vm_event -> vmx_intr_assist() (possibly writing VM_ENTRY_INTR_INFO) -> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() (possibly overwriting the VM_ENTRY_INTR_INFO value). This patch may also be helpful for the future removal of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). Signed-off-by: Razvan Cojocaru Reviewed-by: Boris Ostrovsky Thanks! So now we have three reviewed-bys, if I'm not mistaken all we need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) acks (or otherwise). And you'd need to talk Jürgen into allowing this in, now that we're past the freeze point. (Adding Jürgen to the conversation.) Right, that would be ideal if possible - the code has absolutely no impact on anything unless vm_event is active on the domain, which is never the case for the use-cases considered for a Xen release. So the case I'm making for the patch to go in 4.12 is that: 1. It's perfectly harmless (it affects nothing, except for introspection). 2. It's trivial and thus very easy to see that it's correct. 3. It fixes a major headache for us, and thus it is a great improvement from an introspection standpoint (fixes OS crashes / hangs which we'd otherwise need to work around in rather painful ways). 4. V3 of the patch has been sent out on Dec 14th - it's just that reviewers have had other priorities and it did not gather all acks in time. However, if it's not possible or desirable to allow this in the next best thing is to at least have all the acks necessary for it to go in first thing once the freeze is over. From Julien's reply I understand that the last ack necessary is Tamas'. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 1/14/19 4:42 PM, Juergen Gross wrote: On 14/01/2019 11:56, Razvan Cojocaru wrote: On 1/14/19 11:53 AM, Jan Beulich wrote: On 14.01.19 at 10:34, wrote: On 1/12/19 12:04 AM, Boris Ostrovsky wrote: On 12/14/18 6:49 AM, Razvan Cojocaru wrote: Block interrupts (in vmx_intr_assist()) for the duration of processing a sync vm_event (similarly to the strategy currently used for single-stepping). Otherwise, attempting to emulate an instruction when requested by a vm_event reply may legitimately need to call e.g. hvm_inject_page_fault(), which then overwrites the active interrupt in the VMCS. The sync vm_event handling path on x86/VMX is (roughly): monitor_traps() -> process vm_event -> vmx_intr_assist() (possibly writing VM_ENTRY_INTR_INFO) -> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() (possibly overwriting the VM_ENTRY_INTR_INFO value). This patch may also be helpful for the future removal of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). Signed-off-by: Razvan Cojocaru Reviewed-by: Boris Ostrovsky Thanks! So now we have three reviewed-bys, if I'm not mistaken all we need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) acks (or otherwise). And you'd need to talk Jürgen into allowing this in, now that we're past the freeze point. (Adding Jürgen to the conversation.) Right, that would be ideal if possible - the code has absolutely no impact on anything unless vm_event is active on the domain, which is never the case for the use-cases considered for a Xen release. So the case I'm making for the patch to go in 4.12 is that: 1. It's perfectly harmless (it affects nothing, except for introspection). 2. It's trivial and thus very easy to see that it's correct. 3. It fixes a major headache for us, and thus it is a great improvement from an introspection standpoint (fixes OS crashes / hangs which we'd otherwise need to work around in rather painful ways). 4. V3 of the patch has been sent out on Dec 14th - it's just that reviewers have had other priorities and it did not gather all acks in time. However, if it's not possible or desirable to allow this in the next best thing is to at least have all the acks necessary for it to go in first thing once the freeze is over. From Julien's reply I understand that the last ack necessary is Tamas'. With that ack just arrived: Release-acked-by: Juergen Gross Thanks! Very much appreciated. Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 1/14/19 4:42 PM, Juergen Gross wrote: > On 14/01/2019 11:56, Razvan Cojocaru wrote: >> On 1/14/19 11:53 AM, Jan Beulich wrote: >>>>>> On 14.01.19 at 10:34, wrote: >>>> On 1/12/19 12:04 AM, Boris Ostrovsky wrote: >>>>> On 12/14/18 6:49 AM, Razvan Cojocaru wrote: >>>>>> Block interrupts (in vmx_intr_assist()) for the duration of >>>>>> processing a sync vm_event (similarly to the strategy >>>>>> currently used for single-stepping). Otherwise, attempting >>>>>> to emulate an instruction when requested by a vm_event >>>>>> reply may legitimately need to call e.g. >>>>>> hvm_inject_page_fault(), which then overwrites the active >>>>>> interrupt in the VMCS. >>>>>> >>>>>> The sync vm_event handling path on x86/VMX is (roughly): >>>>>> monitor_traps() -> process vm_event -> vmx_intr_assist() >>>>>> (possibly writing VM_ENTRY_INTR_INFO) -> >>>>>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() >>>>>> (possibly overwriting the VM_ENTRY_INTR_INFO value). >>>>>> >>>>>> This patch may also be helpful for the future removal >>>>>> of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). >>>>>> >>>>>> Signed-off-by: Razvan Cojocaru >>>>> >>>>> >>>>> Reviewed-by: Boris Ostrovsky >>>> >>>> Thanks! So now we have three reviewed-bys, if I'm not mistaken all we >>>> need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) >>>> acks (or otherwise). >>> >>> And you'd need to talk Jürgen into allowing this in, now that we're >>> past the freeze point. >> >> (Adding Jürgen to the conversation.) >> >> Right, that would be ideal if possible - the code has absolutely no >> impact on anything unless vm_event is active on the domain, which is >> never the case for the use-cases considered for a Xen release. >> >> So the case I'm making for the patch to go in 4.12 is that: >> >> 1. It's perfectly harmless (it affects nothing, except for introspection). >> >> 2. It's trivial and thus very easy to see that it's correct. >> >> 3. It fixes a major headache for us, and thus it is a great improvement >> from an introspection standpoint (fixes OS crashes / hangs which we'd >> otherwise need to work around in rather painful ways). >> >> 4. V3 of the patch has been sent out on Dec 14th - it's just that >> reviewers have had other priorities and it did not gather all acks in time. >> >> However, if it's not possible or desirable to allow this in the next >> best thing is to at least have all the acks necessary for it to go in >> first thing once the freeze is over. >> >> From Julien's reply I understand that the last ack necessary is Tamas'. > > With that ack just arrived: > > Release-acked-by: Juergen Gross AFAICT this is fine to apply to staging now, am I incorrect? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V3] x86/vm_event: block interrupt injection for sync vm_events
On 1/16/19 10:58 AM, Jan Beulich wrote: On 16.01.19 at 08:10, wrote: On 1/14/19 4:42 PM, Juergen Gross wrote: On 14/01/2019 11:56, Razvan Cojocaru wrote: On 1/14/19 11:53 AM, Jan Beulich wrote: On 14.01.19 at 10:34, wrote: On 1/12/19 12:04 AM, Boris Ostrovsky wrote: On 12/14/18 6:49 AM, Razvan Cojocaru wrote: Block interrupts (in vmx_intr_assist()) for the duration of processing a sync vm_event (similarly to the strategy currently used for single-stepping). Otherwise, attempting to emulate an instruction when requested by a vm_event reply may legitimately need to call e.g. hvm_inject_page_fault(), which then overwrites the active interrupt in the VMCS. The sync vm_event handling path on x86/VMX is (roughly): monitor_traps() -> process vm_event -> vmx_intr_assist() (possibly writing VM_ENTRY_INTR_INFO) -> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event() (possibly overwriting the VM_ENTRY_INTR_INFO value). This patch may also be helpful for the future removal of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr(). Signed-off-by: Razvan Cojocaru Reviewed-by: Boris Ostrovsky Thanks! So now we have three reviewed-bys, if I'm not mistaken all we need is Tamas' (for the vm_event part) and Julien / Stefano's (for ARM) acks (or otherwise). And you'd need to talk Jürgen into allowing this in, now that we're past the freeze point. (Adding Jürgen to the conversation.) Right, that would be ideal if possible - the code has absolutely no impact on anything unless vm_event is active on the domain, which is never the case for the use-cases considered for a Xen release. So the case I'm making for the patch to go in 4.12 is that: 1. It's perfectly harmless (it affects nothing, except for introspection). 2. It's trivial and thus very easy to see that it's correct. 3. It fixes a major headache for us, and thus it is a great improvement from an introspection standpoint (fixes OS crashes / hangs which we'd otherwise need to work around in rather painful ways). 4. V3 of the patch has been sent out on Dec 14th - it's just that reviewers have had other priorities and it did not gather all acks in time. However, if it's not possible or desirable to allow this in the next best thing is to at least have all the acks necessary for it to go in first thing once the freeze is over. From Julien's reply I understand that the last ack necessary is Tamas'. With that ack just arrived: Release-acked-by: Juergen Gross AFAICT this is fine to apply to staging now, am I incorrect? Yes, but may I ask that you be a little more patient? Of course, apologies. I was just unsure if further action was required on my part. Thank you, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] x86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()
On 1/24/19 8:28 PM, Andrew Cooper wrote: > Code clearing the "Suppress VE" bit in an EPT entry isn't nececsserily running > in current context. In ALTP2M_external mode, it definitely is not, and in PV > context, vcpu_altp2m(current) acts upon the HVM union. > > Even if we could sensibly resolve the target vCPU, it may legitimately not be > fully set up at this point, so rejecting the EPT modification would be buggy. > > There is a path in hvm_hap_nested_page_fault() which explicitly emulates #VE > in the cpu_has_vmx_virt_exceptions case, so the -EOPNOTSUPP part of this > condition is also wrong. > > Drop the !sve check entirely. > > Signed-off-by: Andrew Cooper > --- > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Jun Nakajima > CC: Kevin Tian > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Juergen Gross > > Discovered while trying to fix the gaping security hole with ballooning out > the #VE info page. The risk for 4.12 is very minimal - altp2m is off by > default, not security supported, and the ability to clearing sve is limited to > introspection code paths. Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] x86/svm: Fix handling of ICEBP intercepts
On 2/1/19 4:49 PM, Andrew Cooper wrote: c/s 9338a37d "x86/svm: implement debug events" added support for introspecting ICEBP debug exceptions, but didn't account for the fact that svm_get_insn_len() (previously __get_instruction_length) can fail and may already raise #GP for the guest. If svm_get_insn_len() fails, return back to guest context rather than continuing and mistaking a trap-style VMExit for a fault-style one. Spotted by Coverity. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Brian Woods CC: Juergen Gross CC: Razvan Cojocaru CC: Tamas K Lengyel This wants backporting to Xen 4.11 --- xen/arch/x86/hvm/svm/svm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 2584b90..e21091c 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2758,6 +2758,9 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) { trap_type = X86_EVENTTYPE_PRI_SW_EXCEPTION; inst_len = svm_get_insn_len(v, INSTR_ICEBP); + +if ( !instr_len ) +break; } rc = hvm_monitor_debug(regs->rip, Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys
On 2/4/19 1:55 PM, Paul Durrant wrote: -Original Message- From: Alexandru Stefan ISAILA [mailto:aisa...@bitdefender.com] Sent: 04 February 2019 11:37 To: xen-devel@lists.xenproject.org Cc: Paul Durrant ; jbeul...@suse.com; Andrew Cooper ; Wei Liu ; Roger Pau Monne ; rcojoc...@bitdefender.com; ta...@tklengyel.com; George Dunlap Subject: Re: [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys Ping, any thoughts on this are appreciated. Regards, Alex On 11.01.2019 17:37, Alexandru Stefan ISAILA wrote: This is done so hvmemul_linear_to_phys() can be called from hvmemul_map_linear_addr() I have no objection to pure code movement for this purpose. I certainly prefer it to forward declaration of statics. Thanks for the reply! We're also after thoughts on the whole series (replying to this specific patch instead of the cover letter was an accident). We expect the second patch to be more controversial and in some need of detailed scrutiny. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys
On 2/4/19 5:14 PM, Roger Pau Monné wrote: On Mon, Feb 04, 2019 at 02:00:40PM +0200, Razvan Cojocaru wrote: On 2/4/19 1:55 PM, Paul Durrant wrote: -Original Message- From: Alexandru Stefan ISAILA [mailto:aisa...@bitdefender.com] Sent: 04 February 2019 11:37 To: xen-devel@lists.xenproject.org Cc: Paul Durrant ; jbeul...@suse.com; Andrew Cooper ; Wei Liu ; Roger Pau Monne ; rcojoc...@bitdefender.com; ta...@tklengyel.com; George Dunlap Subject: Re: [PATCH RFC v2 1/2] x86/emulate: Move hvmemul_linear_to_phys Ping, any thoughts on this are appreciated. Regards, Alex On 11.01.2019 17:37, Alexandru Stefan ISAILA wrote: This is done so hvmemul_linear_to_phys() can be called from hvmemul_map_linear_addr() I have no objection to pure code movement for this purpose. I certainly prefer it to forward declaration of statics. Thanks for the reply! We're also after thoughts on the whole series (replying to this specific patch instead of the cover letter was an accident). We expect the second patch to be more controversial and in some need of detailed scrutiny. I've taken a look at patch 2 and provided some review comments, sorry for the delay. Keep in mind we are in the middle of the release process and most developers and reviewers are busy trying to fix bugs or reviewing bugfixes. Very much appreciated! Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 08/10] remove late (on-demand) construction of IOMMU page tables
On 8/16/19 8:19 PM, Paul Durrant wrote: > Now that there is a per-domain IOMMU enable flag, which should be enabled if > any device is going to be passed through, stop deferring page table > construction until the assignment is done. Also don't tear down the tables > again when the last device is de-assigned; defer that task until domain > destruction. > > This allows the has_iommu_pt() helper and iommu_status enumeration to be > removed. Calls to has_iommu_pt() are simply replaced by calls to > is_iommu_enabled(). Remaining open-code tests of iommu_hap_pt_share can also > be replaced by calls to iommu_use_hap_pt(). > The arch_iommu_populate_page_table() and iommu_construct() functions become > redundant, as does the 'strict mode' dom0 page_list mapping code in > iommu_hwdom_init(), and iommu_teardown() can be made static is its only > remaining caller, iommu_domain_destroy(), is within the same source > module. > > All in all, about 220 lines of code are removed. > > NOTE: This patch will cause a small amount of extra resource to be used > to accommodate IOMMU page tables that may never be used, since the > per-domain IOMMU flag enable flag is currently set to the value > of the global iommu_enable flag. A subsequent patch will add an > option to the toolstack to allow it to be turned off if there is > no intention to assign passthrough hardware to the domain. This has slipped under my radar, sorry. Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). While the in-guest scenario continues to pose problems, this patch fixes the "external" case. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/hvm/hvm.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..33038dc 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4533,6 +4533,9 @@ static int do_altp2m_op( break; } +if ( d != current->domain ) +domain_pause(d); + ostate = d->arch.altp2m_active; d->arch.altp2m_active = !!a.u.domain_state.state; @@ -4551,6 +4554,10 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); } + +if ( d != current->domain ) +domain_unpause(d); + break; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 12:51 PM, Jan Beulich wrote: On 08.02.19 at 10:56, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). While the in-guest scenario continues to pose problems, this patch fixes the "external" case. IOW you're papering over the problem rather than fixing it. Why does altp2m_active get set to true before actually having set up everything? Shouldn't it get cleared early, but set late? Well, yes, that would have been my second attempt: set the "altp2m enabled" bool after the init, and before the uninit and no longer domain_pause() explicitly; however I thought that was a brittle solution, relying on comments / programmer attention to the code sequence rather than taking a proper lock. I'll test that scenario then and return with the results / possibly another patch. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 1:13 PM, Razvan Cojocaru wrote: On 2/8/19 12:51 PM, Jan Beulich wrote: On 08.02.19 at 10:56, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). While the in-guest scenario continues to pose problems, this patch fixes the "external" case. IOW you're papering over the problem rather than fixing it. Why does altp2m_active get set to true before actually having set up everything? Shouldn't it get cleared early, but set late? Well, yes, that would have been my second attempt: set the "altp2m enabled" bool after the init, and before the uninit and no longer domain_pause() explicitly; however I thought that was a brittle solution, relying on comments / programmer attention to the code sequence rather than taking a proper lock. I'll test that scenario then and return with the results / possibly another patch. Actually, your suggestion does not work, because the way the code has been designed, altp2m_vcpu_initialise() calls altp2m_vcpu_update_p2m(), which does the proper work that's interesting to us here, like this: 2153 static void vmx_vcpu_update_eptp(struct vcpu *v) 2154 { 2155 struct domain *d = v->domain; 2156 struct p2m_domain *p2m = NULL; 2157 struct ept_data *ept; 2158 2159 if ( altp2m_active(d) ) 2160 p2m = p2m_get_altp2m(v); 2161 if ( !p2m ) 2162 p2m = p2m_get_hostp2m(d); 2163 2164 ept = &p2m->ept; 2165 ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m)); 2166 2167 vmx_vmcs_enter(v); 2168 2169 __vmwrite(EPT_POINTER, ept->eptp); 2170 2171 if ( v->arch.hvm.vmx.secondary_exec_control & 2172 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS ) 2173 __vmwrite(EPTP_INDEX, vcpu_altp2m(v).p2midx); 2174 2175 vmx_vmcs_exit(v); 2176 } So please note that on line 2159 it checks if altp2m is active, and only then does it do the right thing. So setting the d->arch.altp2m_active bool _after_ calling altp2m_vcpu_initialise() will fail to work correctly - turning this into a chicken-and-egg problem, or perhaps more interestingly, another discussion about whether in-guest-only altp2m agents make any sense fundamentally. I think that, sadly, the best we can do at this time is the patch I've sent. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 2:58 PM, Jan Beulich wrote: Well, to be honest I expected dependencies like this to be there, and hence I didn't expect it would be a straightforward change. Just like we do e.g. for the IOMMU enabling, I guess the boolean wants to become a tristate then (off -> enabling -> enabled), which interested sites then can use to distinguish what they want/need to do. Another relatively obvious solution would be to add a boolean parameter to altp2m_vcpu_update_p2m() such that altp2m_vcpu_initialise() can guide it properly. But this of course depends to a certain degree on how wide spread the problem is. Fair enough, I'll attempt V2 with the second suggestion (adding a bool to altp2m_vcpu_update_p2m()). Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/hvm/hvm.c| 17 ++--- xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/altp2m.h | 2 +- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- 7 files changed, 25 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..4d1d13d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,16 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = !!a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +if ( ostate ) +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4554,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); +else +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = true; } + break; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..f9e96fc 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } rc = 1; } @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_u
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 4:00 PM, Razvan Cojocaru wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/hvm/hvm.c| 17 ++--- xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/altp2m.h | 2 +- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- 7 files changed, 25 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..4d1d13d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,16 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = !!a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +if ( ostate ) +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4554,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); +else +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = true; } + break; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..f9e96fc 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } rc = 1; } @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx;
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 5:50 PM, Jan Beulich wrote: On 08.02.19 at 15:00, wrote: >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -4525,7 +4525,7 @@ static int do_altp2m_op( >> case HVMOP_altp2m_set_domain_state: >> { >> struct vcpu *v; >> -bool_t ostate; >> +bool ostate, nstate; >> >> if ( nestedhvm_enabled(d) ) >> { >> @@ -4534,12 +4534,16 @@ static int do_altp2m_op( >> } >> >> ostate = d->arch.altp2m_active; >> -d->arch.altp2m_active = !!a.u.domain_state.state; >> +nstate = !!a.u.domain_state.state; > > No need for !! here. I'll remove it. >> /* If the alternate p2m state has changed, handle appropriately */ >> -if ( d->arch.altp2m_active != ostate && >> +if ( nstate != ostate && >> (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) >> { >> +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ >> +if ( ostate ) >> +d->arch.altp2m_active = false; > > Why the if()? In the opposite case you'd simply write false into > what already holds false. The value written into d->arch.altp2m_active is not the point here. The point is that if ( ostate ), then we are disabling altp2m (because the if above this one makes sure ostate != nstate). So in the disable case, first I wanted to set d->arch.altp2m_active to false (which immediately causes altp2m_active(d) to return false as well), and then actually perform the work. >> @@ -4550,7 +4554,14 @@ static int do_altp2m_op( >> >> if ( ostate ) >> p2m_flush_altp2m(d); >> +else >> +/* >> + * Wait until altp2m_vcpu_initialise() is done before >> marking >> + * altp2m as being enabled for the domain. >> + */ >> +d->arch.altp2m_active = true; > > Similarly here you could omit the "else" and simply store "nstate" afaict. As above, in the enable case I wanted to first setup altp2m on all VCPUs with altp2m_vcpu_initialise(), and only after that set d->arch.altp2m_active = true. In summary, if ( ostate ) we want to set d->arch.altp2m_active before the for (we're disabling altp2m), and if ( !ostate ) (which is the else above) we want to set d->arch.altp2m_active after said for. We can indeed store nstate. I just thought things look clearer with "true" and "false", but if you prefer there's no problem assigning nstate. >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) >> return !!cpu_has_monitor_trap_flag; >> } >> >> -static void vmx_vcpu_update_eptp(struct vcpu *v) >> +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) >> { >> struct domain *d = v->domain; >> struct p2m_domain *p2m = NULL; >> struct ept_data *ept; >> >> -if ( altp2m_active(d) ) >> +if ( altp2m_enabled ) >> p2m = p2m_get_altp2m(v); >> if ( !p2m ) >> p2m = p2m_get_hostp2m(d); > > Is this an appropriate transformation? That is, can there not be > any domains for which altp2m_active() returns false despite > altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't > really judge whether index would always be INVALID_ALTP2M in > such a case.) Yes, it should be completely safe (please see below for details). >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, >> unsigned int idx) >> atomic_dec(&p2m_get_altp2m(v)->active_vcpus); >> vcpu_altp2m(v).p2midx = idx; >> atomic_inc(&p2m_get_altp2m(v)->active_vcpus); >> -altp2m_vcpu_update_p2m(v); >> +altp2m_vcpu_update_p2m(v, altp2m_active(d)); >> } >> rc = 1; >> } >> @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, >> unsigned int idx) >> atomic_dec(&p2m_get_altp2m(v)->active_vcpus); >> vcpu_altp2m(v).p2midx = idx; >> atomic_inc(&p2m_get_altp2m(v)->active_vcpus); >> -altp2m_vcpu_update_p2m(v); >> +altp2m_vcpu_update_p2m(v, altp2m_active(d)); >> } > > In both cases, isn't altp2m_active() going to return true anyway > when we get there (related to the question above)? Yes, it will return true. In order for it to return false, altp2m_vcpu_destroy() would have had to run on that VCPU, which (among other things) calls altp2m_vcpu_reset(), which does v->p2midx = INVALID_ALTP2M. There's an if() above (not shown in your reply) that says "if ( idx != vcpu_altp2m(v).p2midx )", so, indeed, by the time we end up here we can reasonably assume that altp2m_active(d) will return true. I've just put in "altp2m_active(d)" to make sure there's absolutely no functional change here, but AFA
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 6:44 PM, Jan Beulich wrote: On 08.02.19 at 17:30, wrote: >> On 2/8/19 5:50 PM, Jan Beulich wrote: >> On 08.02.19 at 15:00, wrote: /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +if ( ostate ) +d->arch.altp2m_active = false; >>> >>> Why the if()? In the opposite case you'd simply write false into >>> what already holds false. >> >> The value written into d->arch.altp2m_active is not the point here. The >> point is that if ( ostate ), then we are disabling altp2m (because the >> if above this one makes sure ostate != nstate). >> >> So in the disable case, first I wanted to set d->arch.altp2m_active to >> false (which immediately causes altp2m_active(d) to return false as >> well), and then actually perform the work. > > Sure - I'm not asking to remove everything above, just the if() > (and keep what is now the body). That is because > - in the enable case the value is false and writing false into it is > benign, > - in the disable case you want to actively change from true to > false. > @@ -4550,7 +4554,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); +else +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = true; >>> >>> Similarly here you could omit the "else" and simply store "nstate" afaict. >> >> As above, in the enable case I wanted to first setup altp2m on all VCPUs >> with altp2m_vcpu_initialise(), and only after that set >> d->arch.altp2m_active = true. >> >> In summary, if ( ostate ) we want to set d->arch.altp2m_active before >> the for (we're disabling altp2m), and if ( !ostate ) (which is the else >> above) we want to set d->arch.altp2m_active after said for. >> >> We can indeed store nstate. I just thought things look clearer with >> "true" and "false", but if you prefer there's no problem assigning nstate. > > Well, as always with mm and altp2m code, I'm not the maintainer, so > I don't have the final say. It's just that I think unnecessary conditionals > would better be avoided, even if they're not on a performance critical > path. > --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); >>> >>> Is this an appropriate transformation? That is, can there not be >>> any domains for which altp2m_active() returns false despite >>> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't >>> really judge whether index would always be INVALID_ALTP2M in >>> such a case.) >> >> Yes, it should be completely safe (please see below for details). >> --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } rc = 1; } @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } >>> >>> In both cases, isn't altp2m_active() going to return true anyway >>> when we get there (related to the question above)? >> >> Yes, it will return true. In order for it to return false, >> altp2m_vcpu_destroy() would have had to run on that VCPU, which (among >> other things) calls altp2m_vcpu_reset(), which does v->p2midx = >> INVALID_ALTP2M. > > Hmm, according to the change further up in this patch, altp2m_active() >
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
Right, I now understand what you meant by removing "if ( ostate )" - you'd like d->arch.altp2m_active to only be set _after_ the for, for the enable, as well as for the disable case. No, certainly not. Exactly because of ... However, I wanted to keep both if()s to avoid another problem with this code: [...] So for the disable case, I wanted altp2m_active(v->domain) to return false immediately so that this code won't be triggered. Otherwise, assuming the following scenario: ... this. Apparently "and keep what is now the body" was still not clear enough. Taking your original code, I mean if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { /* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ d->arch.altp2m_active = false; for_each_vcpu( d, v ) [...] if ( ostate ) p2m_flush_altp2m(d); /* * Wait until altp2m_vcpu_initialise() is done before marking * altp2m as being enabled for the domain. */ d->arch.altp2m_active = nstate; } Ah! Thanks for the clarification. Now, you're right that p2m_get_altp2m() may hand over a pointer to an altp2m between the moment where d->arch.altp2m_active is false and the point where v->p2midx actually becomes INVALID_ALTP2M with my code; but I think it can be argued that I should rather fix p2m_get_altp2m(), to return NULL if altp2m_active() == false and keep the if()s (which I've hopefully been more eloquent about now). Yes, that looks to be an option, provided it doesn't violate assumptions elsewhere. It doesn't, AFAICT. Additionally, it can be argued that if code relies on p2m_get_altp2m() returning not-NULL when !altp2m_active(), said code is broken. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. p2m_get_altp2m() now returns NULL if !altp2m_active(), to prevent it from returning a "valid" altp2m pointer between the moment where d->arch.altp2m_active = false and the point when v->p2midx actually becomes INVALID_ALTP2M. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- Changes since V2: - Removed leftover asm-x86/altp2m.h changes. - nstate = !!a.u.domain_state.state; becomes nstate = a.u.domain_state.state; - Removed the if() and else in do_altp2m_op() as recommended by Jan. - Using true explicitly instead of altp2m_active(d) for altp2m_vcpu_update_p2m() in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(). - Updated patch description. - Modified p2m_get_altp2m() to return NULL if !altp2m_active(). --- xen/arch/x86/hvm/hvm.c| 16 +--- xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- xen/include/asm-x86/p2m.h | 8 +++- 7 files changed, 30 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..50d896d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,15 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4553,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); + +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = nstate; } + break; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..6f991f8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 12:10 PM, Jan Beulich wrote: On 11.02.19 at 10:13, wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); With the change you now make to p2m_get_altp2m(), this looks to be a benign change. Which to me would suggest to either leave the code alone, or to drop the if() (but - again - not its body) altogether. At which point the code could be further streamlined, as then the NULL initializer can go away and the assignment (or then perhaps initializer) could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)". (Generally I'd recommend to leave out the change here, and do the transformation in a follow-on patch.) Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 12:57 PM, Razvan Cojocaru wrote: On 2/11/19 12:10 PM, Jan Beulich wrote: On 11.02.19 at 10:13, wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; - if ( altp2m_active(d) ) + if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); With the change you now make to p2m_get_altp2m(), this looks to be a benign change. Which to me would suggest to either leave the code alone, or to drop the if() (but - again - not its body) altogether. At which point the code could be further streamlined, as then the NULL initializer can go away and the assignment (or then perhaps initializer) could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)". (Generally I'd recommend to leave out the change here, and do the transformation in a follow-on patch.) Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... I think the best that can be done here is to check if altp2m_active() early in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since these are only called by HVMOP_altp2m_* (and thus serialized by the domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. This of course means reverting p2m_get_altp2m() to its original non-intuitive state of returning a valid altp2m pointer even when altp2m_active() is false. I see no other way out of this (aside from the domain_pause() fix). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 6:59 PM, Jan Beulich wrote: >>> Thanks for noticing, actually this appears to invalidate the whole >>> purpose of the patch (I should have tested this more before sumbitting >>> V3, sorry). >>> >>> The whole point of the new boolean is to have p2m assigned an altp2m >>> regardless of altp2m_active() (hence the change) - which now no longer >>> happens. I got carried away with this change. >>> >>> The fact that this is so easy to get wrong is the reason why I've >>> preferred the domain_pause() solution. There appears to be no clean way >>> to fix this otherwise, and if this is so easy to misunderstand it'll >>> break just as easily with further changes. >>> >>> I suppose I could just pass the bool along to p2m_get_altp2m() (and >>> indeed remove the if())... >> >> I think the best that can be done here is to check if altp2m_active() >> early in p2m_switch_vcpu_altp2m_by_id() and >> p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since >> these are only called by HVMOP_altp2m_* (and thus serialized by the >> domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. > > I'm confused: Where do you see the domain lock used there? > Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for > any HVMOP_altp2m_* at all. One of the actual callers is guarded > by altp2m_active(), but the other isn't. do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and unlocks it before it exits. But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any HVMOP_altp2m_*, I've misread that. Hence, I believe both callers ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*. Would you like me to add the altp2m_active() check in both p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and make it harder to race (it still won't be impossible, since the bool may become false between the check and the actual function logic for p2m_switch_vcpu_altp2m_by_id(), as you've noticed)? >> I see no other way out of this (aside from the domain_pause() fix). > > If only that one would have been a complete fix, rather than just a > partial one. Agreed, but that one at least clearly fixes the external case, whereas this doesn't seem to cover all corner cases for any situation. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/12/19 9:31 AM, Jan Beulich wrote: On 11.02.19 at 18:21, wrote: On 2/11/19 6:59 PM, Jan Beulich wrote: Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... I think the best that can be done here is to check if altp2m_active() early in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since these are only called by HVMOP_altp2m_* (and thus serialized by the domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. I'm confused: Where do you see the domain lock used there? Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for any HVMOP_altp2m_* at all. One of the actual callers is guarded by altp2m_active(), but the other isn't. do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and unlocks it before it exits. But that's not the "domain lock". This is just making sure the domain won't disappear behind your back. But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any HVMOP_altp2m_*, I've misread that. Hence, I believe both callers ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*. Would you like me to add the altp2m_active() check in both p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and make it harder to race (it still won't be impossible, since the bool may become false between the check and the actual function logic for p2m_switch_vcpu_altp2m_by_id(), as you've noticed)? Well, altp2m being Tech Preview, any partial fix _could_ do in principle. But personally I dislike such half hearted approaches, and hence I'd recommend to address the issue in full, i.e. eliminate race potential altogether. In the end you're going to be bitten more than me by not getting this into proper shape. I'll attempt a V4 doing my best to check altp2m_active() then, and see if that's at least satisfactory for sane use-cases. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 6:59 PM, Jan Beulich wrote: Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for any HVMOP_altp2m_* at all. One of the actual callers is guarded by altp2m_active(), but the other isn't. Actually I see that both places are guarded by altp2m_active(). In p2m.c we have: 2312 void p2m_altp2m_check(struct vcpu *v, uint16_t idx) 2313 { 2314 if ( altp2m_active(v->domain) ) 2315 p2m_switch_vcpu_altp2m_by_id(v, idx); 2316 } and in vmx.c: 2225 static int vmx_vcpu_emulate_vmfunc(const struct cpu_user_regs *regs) 2226 { 2227 int rc = X86EMUL_EXCEPTION; 2228 struct vcpu *curr = current; 2229 2230 if ( !cpu_has_vmx_vmfunc && altp2m_active(curr->domain) && 2231 regs->eax == 0 && 2232 p2m_switch_vcpu_altp2m_by_id(curr, regs->ecx) ) 2233 rc = X86EMUL_OKAY; 2234 2235 return rc; 2236 } here there's an "&& altp2m_active(curr->domain)" in the if(). So I suppose in our scenario all that's needed it a similar check here: 4636 case HVMOP_altp2m_switch_p2m: 4637 rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); 4638 break; for the other function, p2m_switch_domain_altp2m_by_id(). Unless I'm missing something. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- Changes since V3: - Reverted p2m_get_altp2m() to its original state. - Added an altp2m_active() check to HVMOP_altp2m_switch_p2m. - Updated patch description. --- xen/arch/x86/hvm/hvm.c| 20 xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- 6 files changed, 26 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..21ec059 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,15 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4553,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); + +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = nstate; } + break; } @@ -4624,7 +4634,9 @@ static int do_altp2m_op( break; case HVMOP_altp2m_switch_p2m: -rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); +rc = -EOPNOTSUPP; +if ( altp2m_active(d) ) +rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); break; case HVMOP_altp2m_set_suppress_ve: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..6f991f8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/12/19 2:57 PM, Jan Beulich wrote: On 12.02.19 at 12:42, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru FTR - this looks okay to me now. But I don't feel qualified to give an R-b, so just for the parts where it's applicable Acked-by: Jan Beulich Thanks for the ack and the help! Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version
On 2/12/19 8:13 PM, Tamas K Lengyel wrote: > On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru > wrote: >> >> On 1/8/19 6:27 PM, Jan Beulich wrote: >>>>>> On 19.12.18 at 19:52, wrote: >>>> Signed-off-by: Petre Pircalabu >>> >>> An empty description is not helpful. The immediate question is: Why? >>> We don't do this for other interface versions. I'm unconvinced a >>> special purpose piece of information like this one belongs into the >>> rather generic version hypercall. >> >> For an introspection application meant to be deployed on several Xen >> versions without recompiling, it is important to be able to decide at >> runtime what size and layout the vm_event struct has. >> >> Currently this can somewhat be done by associating the current version >> with the vm_event version, but that is not ideal for obvious reasons. > > We do exactly that in LibVMI and it works fine - care to elaborate > what problem you have with doing that? There is a 1:1 match between > any stable Xen version and a vm_event interface version. I don't think > we will ever have a situation where we bump the vm_event interface > version but not the Xen release version. XenServer. Any given version of XenServer may or may not fit the matrix you're talking about (because some patches are backported, some are not, etc.). In which case it all becomes very complicated. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/14/19 8:06 PM, George Dunlap wrote: > On 2/12/19 11:42 AM, Razvan Cojocaru wrote: >> HVMOP_altp2m_set_domain_state does not domain_pause(), presumably >> on purpose (as it was originally supposed to cater to a in-guest >> agent, and a domain pausing itself is not a good idea). >> >> This can lead to domain crashes in the vmx_vmexit_handler() code >> that checks if the guest has the ability to switch EPTP without an >> exit. That code can __vmread() the host p2m's EPT_POINTER >> (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a >> chance to run altp2m_vcpu_initialise(), but after >> d->arch.altp2m_active is set). > > Sorry, where exactly does the crash happen? 3655 /* 3656 * If the guest has the ability to switch EPTP without an exit, 3657 * figure out whether it has done so and update the altp2m data. 3658 */ 3659 if ( altp2m_active(v->domain) && 3660 (v->arch.hvm.vmx.secondary_exec_control & 3661 SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) ) 3662 { 3663 unsigned long idx; 3664 3665 if ( v->arch.hvm.vmx.secondary_exec_control & 3666 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS ) 3667 __vmread(EPTP_INDEX, &idx); 3668 else 3669 { 3670 unsigned long eptp; 3671 3672 __vmread(EPT_POINTER, &eptp); 3673 3674 if ( (idx = p2m_find_altp2m_by_eptp(v->domain, eptp)) == 3675 INVALID_ALTP2M ) 3676 { 3677 gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n"); 3678 domain_crash(v->domain); Right here (at line 3678 in vmx.c). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). Sorry to come in here on v4 and suggest changing everything, but I don’t really like the solution you have here. Not setting altp2m to ‘active’ until after the vcpus are set up makes sense; but passing this true/false value in seems ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled between the check in HVMOP_altp2m_switch_p2m and the time we actually call altp2m_vcpu_update_p2m()?) I certainly don’t think domain_pause() should be our go-to solution for race avoidance, but in this case it really seems like switching the global p2m for every vcpu at once makes the most sense; and trying to safely change this on an unpaused domain is not only overly complicated, but probably not what we wanted anyway. p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use domain_pause_except_self(); so it seems like not using it for altp2m_set_domain_state was probably more of an oversight than an intentional decision. Using that here seems like a more robust solution. The one issue is that domain_pause_except_self() currently is actually a deadlock risk if two different vcpus start it at the same time. I think the attached patch (compile-tested only) should fix this issue; after this patch you should be able to use domain_pause_except_self() in altp2m_set_domain_state instead. Does that sound reasonable? Not only does that sound reasonable, I personally prefer this approach. I'll apply this patch and give it a spin. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/15/19 3:37 PM, George Dunlap wrote: On Feb 15, 2019, at 1:24 PM, Jan Beulich wrote: On 15.02.19 at 13:52, wrote: On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). Sorry to come in here on v4 and suggest changing everything, but I don’t really like the solution you have here. Not setting altp2m to ‘active’ until after the vcpus are set up makes sense; but passing this true/false value in seems ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled between the check in HVMOP_altp2m_switch_p2m and the time we actually call altp2m_vcpu_update_p2m()?) I certainly don’t think domain_pause() should be our go-to solution for race avoidance, but in this case it really seems like switching the global p2m for every vcpu at once makes the most sense; and trying to safely change this on an unpaused domain is not only overly complicated, but probably not what we wanted anyway. p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use domain_pause_except_self(); so it seems like not using it for altp2m_set_domain_state was probably more of an oversight than an intentional decision. Using that here seems like a more robust solution. Ah, I didn't even recall there was such a function. As this now also allows covering a domain requesting the operation for itself, I don't mind the pausing approach anymore. Yeah, I forgot too until I was grepping for “domain_pause” to figure out what everyone else was doing. :-) The one issue is that domain_pause_except_self() currently is actually a deadlock risk if two different vcpus start it at the same time. I think the attached patch (compile-tested only) should fix this issue; after this patch you should be able to use domain_pause_except_self() in altp2m_set_domain_state instead. There's one thing I don't really like here, which is a result of the (necessary) re-use of the hypercall deadlock mutex: This certainly poses the risk of getting called from a context where the lock was already acquired. Therefore I'd like to suggest to use this lock in a recursive way (here and elsewhere). And two cosmetic remarks - there's no need to re-specify __must_check on the function definition, as the function declaration ought to be in scope anyway. And there's a stray blank inside the likely() you add. I don’t see that I added a ‘likely’; there’s one in context, but I don’t see any stray blanks there. The other two points make sense — Razvan, would you be willing to make those changes (and test the result, as I haven’t done more than compile-test it)? Of course, happy to. Just to make sure I understand where we stand: I'll try to leave the mutex alone for now and only switch to a recursive one if anything blows up. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V5 1/2] altp2m: Prevent deadlocks when a domain performs altp2m operations on itself
domain_pause_except_self() was introduced to allow a domain to pause itself while doing altp2m operations. However, as written, it has a risk fo deadlock if two vcpus enter the loop at the same time. Luckily, there's already a solution for this: Attempt to call domain's hypercall_deadlock_mutex, and restart the entire hypercall if you fail. Make domain_pause_except_self() attempt to grab this mutex when pausing itself, returning -ERESTART if it fails. Have the callers check for errors and pass the value up. In both cases, the top-level do_hvm_op() should DTRT when -ERESTART is returned. The (necessary) reuse of the hypercall deadlock mutex poses the risk of getting called from a context where the lock was already acquired (e.g. someone may (say) call domctl_lock(), then afterwards call domain_pause_except_self()). However, in the interest of not overcomplicating things, no changes are made here to the mutex. Attempted nesting of this lock isn't a security issue, because all that will happen is that the vcpu will livelock taking continuations. Signed-off-by: George Dunlap Tested-by: Razvan Cojocaru --- Release exception justification: - Fixes a bug in the current code - Lays the foundation of another fix - Only affects altp2m, which isn't a supported feature NB two possible further improvements to put on the list for 4.13: - Replace send-interrupt-and-wait-for-each-one loop with hvmop_flush_tlb_all()'s more efficient send-all-the-interrupts-then-wait loop. - Modify hvmop_flush_tlb_all() to use domain_pause_except_self() to avoid code duplication --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" CC: George Dunlap CC: Ian Jackson CC: Julien Grall CC: Konrad Rzeszutek Wilk CC: Stefano Stabellini CC: Tim Deegan --- xen/arch/x86/mm/p2m.c | 10 -- xen/common/domain.c | 8 +++- xen/include/xen/sched.h | 2 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..7232dbf 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2530,8 +2530,11 @@ int p2m_destroy_altp2m_by_id(struct domain *d, unsigned int idx) if ( !idx || idx >= MAX_ALTP2M ) return rc; -domain_pause_except_self(d); +rc = domain_pause_except_self(d); +if ( rc ) +return rc; +rc = -EBUSY; altp2m_list_lock(d); if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) ) @@ -2561,8 +2564,11 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) if ( idx >= MAX_ALTP2M ) return rc; -domain_pause_except_self(d); +rc = domain_pause_except_self(d); +if ( rc ) +return rc; +rc = -EINVAL; altp2m_list_lock(d); if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 7470cd9..32bca8d 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1134,18 +1134,24 @@ int domain_unpause_by_systemcontroller(struct domain *d) return 0; } -void domain_pause_except_self(struct domain *d) +int domain_pause_except_self(struct domain *d) { struct vcpu *v, *curr = current; if ( curr->domain == d ) { +/* Avoid racing with other vcpus which may want to be pausing us */ +if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) +return -ERESTART; for_each_vcpu( d, v ) if ( likely(v != curr) ) vcpu_pause(v); +spin_unlock(&d->hypercall_deadlock_mutex); } else domain_pause(d); + +return 0; } void domain_unpause_except_self(struct domain *d) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index d633e1d..edee52d 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -839,7 +839,7 @@ static inline int domain_pause_by_systemcontroller_nosync(struct domain *d) } /* domain_pause() but safe against trying to pause current. */ -void domain_pause_except_self(struct domain *d); +int __must_check domain_pause_except_self(struct domain *d); void domain_unpause_except_self(struct domain *d); /* -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). Signed-off-by: Razvan Cojocaru --- CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- xen/arch/x86/hvm/hvm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 410623d..79c7d81 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4538,6 +4538,10 @@ static int do_altp2m_op( break; } +rc = domain_pause_except_self(d); +if ( rc ) +break; + ostate = d->arch.altp2m_active; d->arch.altp2m_active = !!a.u.domain_state.state; @@ -4556,6 +4560,8 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); } + +domain_unpause_except_self(d); break; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/18/19 1:19 PM, Jan Beulich wrote: On 15.02.19 at 16:41, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). Signed-off-by: Razvan Cojocaru Acked-by: Jan Beulich Also I assume you mean to have the same justification here as you have in patch 1, as to 4.12 inclusion. And for this reason I'm also Cc-ing Jürgen again. Ideed, I do. Thank you very much. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC 0/6] Slotted channels for sync vm_events
On 2/12/19 7:01 PM, Tamas K Lengyel wrote: On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU wrote: On Thu, 2019-02-07 at 11:46 +, George Dunlap wrote: On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote: On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote: This patchset is a rework of the "multi-page ring buffer" for vm_events patch based on Andrew Cooper's comments. For synchronous vm_events the ring waitqueue logic was unnecessary as the vcpu sending the request was blocked until a response was received. To simplify the request/response mechanism, an array of slotted channels was created, one per vcpu. Each vcpu puts the request in the corresponding slot and blocks until the response is received. I'm sending this patch as a RFC because, while I'm still working on way to measure the overall performance improvement, your feedback would be a great assistance. Is anyone still using asynchronous vm_event requests? (the vcpu is not blocked and no response is expected). If not, I suggest that the feature should be removed as it (significantly) increases the complexity of the current implementation. Could you describe in a bit more detail what the situation is? What's the current state fo affairs with vm_events, what you're trying to change, why async vm_events is more difficult? The main reason for the vm_events modification was to improve the overall performance in high throughput introspection scenarios. For domus with a higher vcpu count, a vcpu could sleep for a certain period of time while waiting for a ring slot to become available (__vm_event_claim_slot) The first patchset only increased the ring size, and the second iteraton, based on Andrew Copper's comments, proposed a separate path to handle synchronous events ( a slotted buffer for each vcpu) in order to have the events handled independently of one another. To handle asynchronous events, a dynamically allocated vm_event ring is used. While the implementation is not exactly an exercise in simplicity, it preserves all the needed functionality and offers fallback if the Linux domain running the monitor application doesn't support IOCTL_PRIVCMD_MMAP_RESOURCE. However, the problem got a little bit more complicated when I tried implementing the vm_events using an IOREQ server (based on Paul Durrant's comments). For synchronous vm_events, it simplified things a little, eliminating both the need for a special structure to hold the processing state and the evtchns for each vcpu. The asynchronous events were a little more tricky to handle. The buffered ioreqs were a good candidate, but the only thing usable is the corresponding evtchn in conjunction with an existing ring. In order to use them, a mock buffered ioreq should be created and transmitted, with the only meaningful field being the ioreq type. I certainly think it would be better if you could write the new vm_event interface without having to spend a lot of effort supporting modes that you think nobody uses. On the other hand, getting into the habit of breaking stuff, even for people we don't know about, will be a hindrance to community growth; a commitment to keeping it working will be a benefit to growth. But of course, we haven't declared the vm_event interface 'supported' (it's not even mentioned in the SUPPORT.md document yet). Just for the sake of discussion, would it be possible / reasonble, for instance, to create a new interface, vm_events2, instead? Then you could write it to share the ioreq interface without having legacy baggage you're not using; we could deprecate and eventually remove vm_events1, and if anyone shouts, we can put it back. Thoughts? -George Yes, it's possible and it will GREATLY simplify the implementation. I just have to make sure the interfaces are mutually exclusive. I'm for removing features from the vm_event interface that are no longer in use, especially if they block more advantageous changes like this one. We don't know what the use-case was for async events nor have seen anyone even mention them since I've been working with Xen. Creating a new interface, as mentioned above, would make sense if there was a disagreement with retiring this feature. I don't think that's the case. I certainly would prefer not having to maintain two separate interfaces going forward without a clear justification and documented use-case explaining why we keep the old interface around. AFAICT, the async model is broken conceptually as well, so it makes no sense. It would make sense, IMHO, if it would be lossy (i.e. we just write in the ring buffer, if somebody manages to "catch" an event while it flies by then so be it, if not it will be overwritten). If it would have been truly lossy, I'd see a use-case for it, for gathering statistics maybe. However, as the code is now, the VCPUs are not paused only if there's still space in the ring buffer. If there's no more space in the ring buffer, the VCPU trying to put an event in still gets pause
Re: [Xen-devel] [PATCH 1/4] xen/common: Break domain_unmap_resources() out of domain_kill()
On 2/20/19 12:18 AM, Andrew Cooper wrote: > A subsequent change is going to need an x86-specific unmapping step, so take > the opportunity to split the current vcpu unmapping out into a dedicated path. > > No practical change. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Juergen Gross > --- > xen/common/domain.c | 16 +--- > xen/include/xen/domain.h | 4 > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 32bca8d..e66f7ea 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -700,10 +700,21 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, > struct domain **d) > return 0; > } > > +static void domain_unmap_resources(struct domain *d) > +{ > +struct vcpu *v; > + > +for_each_vcpu ( d, v ) > +{ > +unmap_vcpu_info(v); > + > +arch_vcpu_unmap_resources(v); > +} > +} > + > int domain_kill(struct domain *d) > { > int rc = 0; > -struct vcpu *v; > > if ( d == current->domain ) > return -EINVAL; > @@ -732,13 +743,12 @@ int domain_kill(struct domain *d) > d->tmem_client = NULL; > /* fallthrough */ > case DOMDYING_dying: > +domain_unmap_resources(d); > rc = domain_relinquish_resources(d); > if ( rc != 0 ) > break; > if ( cpupool_move_domain(d, cpupool0) ) > return -ERESTART;a > -for_each_vcpu ( d, v ) > -unmap_vcpu_info(v); So before this change there was a leak of some sort here? I see that before this patch unmap_vcpu_info() was called only if rc == 0, but it is now called unconditionally _before_ domain_relinquish_resources(). This does appear to change the behaviour of the code in the error case. If that's intended: Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/4] x86/altp2m: Rework #VE enable/disable paths
On 2/20/19 12:18 AM, Andrew Cooper wrote: > Split altp2m_vcpu_{enable,disable}_ve() out of the > HVMOP_altp2m_vcpu_{enable,disable}_notify marshalling logic. A future change > is going to need to call altp2m_vcpu_disable_ve() from the domain_kill() path. > > While at it, clean up the logic in altp2m_vcpu_{initialise,destroy}(). > altp2m_vcpu_reset() has no external callers, so fold it into its two > callsites. This in turn allows for altp2m_vcpu_destroy() to reuse > altp2m_vcpu_disable_ve() rather than opencoding it. > > No practical change. > > Signed-off-by: Andrew Cooper Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/4] x86/vmx: Properly flush the TLB when an altp2m is modified
On 2/20/19 12:18 AM, Andrew Cooper wrote: > Modificaitons to an altp2m mark the p2m as needing flushing, but this was > never wired up in the return-to-guest path. As a result, stale TLB entries > can remain after resuming the guest. > > In practice, this manifests as a missing EPT_VIOLATION or #VE exception when > the guest subsequently accesses a page which has had its permissions reduced. > > vmx_vmenter_helper() now has 11 p2ms to potentially invalidate, but issuing 11 > INVEPT instructions isn't clever. Instead, count how many contexts need > invalidating, and use INVEPT_ALL_CONTEXT if two or more are in need of > flushing. > > This doesn't have an XSA because altp2m is not yet a security-supported > feature. > > Signed-off-by: Andrew Cooper Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page
On 2/20/19 12:18 AM, Andrew Cooper wrote: The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is dangerous. After #VE has been set up, the guest can balloon out and free the nominated GFN, after which the processor may write to it. Also, the unlocked GFN query means the MFN is stale by the time it is used. Alternatively, a guest can race two disable calls to cause one VMCS to still reference the nominated GFN after the tracking information was dropped. Rework the logic from scratch to make it safe. Hold an extra page reference on the underlying frame, to account for the VMCS's reference. This means that if the GFN gets ballooned out, it isn't freed back to Xen until #VE is disabled, and the VMCS no longer refers to the page. A consequence of this is that arch_vcpu_unmap_resources() now needs to call altp2m_vcpu_disable_ve() to drop the reference during domain_kill(), to allow all of the memory to be freed. For domains using altp2m, we expect a single enable call and no disable for the remaining lifetime of the domain. However, to avoid problems with concurrent calls, use cmpxchg() to locklessly maintain safety. This doesn't have an XSA because altp2m is not yet a security-supported feature. Signed-off-by: Andrew Cooper Tested-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page
On 2/21/19 10:18 PM, Andrew Cooper wrote: > The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is > dangerous. After #VE has been set up, the guest can balloon out and free the > nominated GFN, after which the processor may write to it. Also, the unlocked > GFN query means the MFN is stale by the time it is used. Alternatively, a > guest can race two disable calls to cause one VMCS to still reference the > nominated GFN after the tracking information was dropped. > > Rework the logic from scratch to make it safe. > > Hold an extra page reference on the underlying frame, to account for the > VMCS's reference. This means that if the GFN gets ballooned out, it isn't > freed back to Xen until #VE is disabled, and the VMCS no longer refers to the > page. > > A consequence of this is that altp2m_vcpu_disable_ve() needs to be called > during the domain_kill() path, to drop the reference for domains which shut > down with #VE still enabled. > > For domains using altp2m, we expect a single enable call and no disable for > the remaining lifetime of the domain. However, to avoid problems with > concurrent calls, use cmpxchg() to locklessly maintain safety. > > This doesn't have an XSA because altp2m is not yet a security-supported > feature. > > Signed-off-by: Andrew Cooper > Release-acked-by: Juergen Gross Tested-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel