[Xen-devel] Crash with nested HVM and Linux v5.1+
With nestedhvm=1, the L2 HVM guest is either hanging (Xen 4.8) or crashing (Xen 4.12.1) the L1 Xen hypervisor with recent versions of Linux. We isolated the commit to: commit 093ae8f9a86a974c920b613860f1f7fd5bbd70ab Author: Borislav Petkov Date: Thu Apr 12 13:11:36 2018 +0200 x86/TSC: Use RDTSCP Currently, the kernel uses [LM]FENCE; RDTSC in the timekeeping code, to guarantee monotonicity of time where the *FENCE is selected based on vendor. Replace that sequence with RDTSCP which is faster or on-par and gives the same guarantees. A microbenchmark on Intel shows that the change is on-par. On AMD, the change is either on-par with the current LFENCE-prefixed RDTSC or slightly better with RDTSCP. The comparison is done with the LFENCE-prefixed RDTSC (and not with the MFENCE-prefixed one, as one would normally expect) because all modern AMD families make LFENCE serializing and thus avoid the heavy MFENCE by effectively enabling X86_FEATURE_LFENCE_RDTSC. I could not find RDTSCP instruction being used by Linux before the given commit, which is present in Linux v5.1 and newer. As expected, masking off the RDTSCP cpuid flag in leaf 0x8001 prevents a cooperative guest from using that instruction and therefore prevents the crash. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Terminology for "guest" - Was: [PATCH] docs/sphinx: Introduction
On 8/12/19 8:01 AM, Andrew Cooper wrote: On 12/08/2019 15:53, George Dunlap wrote: On 8/8/19 10:13 AM, Julien Grall wrote: Hi Jan, On 08/08/2019 10:04, Jan Beulich wrote: On 08.08.2019 10:43, Andrew Cooper wrote: On 08/08/2019 07:22, Jan Beulich wrote: On 07.08.2019 21:41, Andrew Cooper wrote: --- /dev/null +++ b/docs/glossary.rst @@ -0,0 +1,37 @@ +Glossary + + +.. Terms should appear in alphabetical order + +.. glossary:: + + control domain + A :term:`domain`, commonly dom0, with the permission and responsibility + to create and manage other domains on the system. + + domain + A domain is Xen's unit of resource ownership, and generally has at the + minimum some RAM and virtual CPUs. + + The terms :term:`domain` and :term:`guest` are commonly used + interchangeably, but they mean subtly different things. + + A guest is a single virtual machine. + + Consider the case of live migration where, for a period of time, one + guest will be comprised of two domains, while it is in transit. + + domid + The numeric identifier of a running :term:`domain`. It is unique to a + single instance of Xen, used as the identifier in various APIs, and is + typically allocated sequentially from 0. + + guest + See :term:`domain` I think you want to mention the usual distinction here: Dom0 is, while a domain, commonly not considered a guest. To be honest, I had totally forgotten about that. I guess now is the proper time to rehash it in public. I don't think the way it currently gets used has a clear or coherent set of rules, because I can't think of any to describe how it does get used. Either there are a clear and coherent (and simple!) set of rules for what we mean by "guest", at which point they can live here in the glossary, or the fuzzy way it is current used should cease. What's fuzzy about Dom0 not being a guest (due to being a part of the host instead)? Dom0 is not part of the host if you are using an hardware domain. I actually thought we renamed everything in Xen from Dom0 to hwdom to avoid the confusion. I also would rather avoid to treat "dom0" as not a guest. In normal setup this is a more privilege guest, in other setup this may just be a normal guest (think about partitioning). A literal guest is someone who doesn't live in the building (or work in the buliding, if you're in a hotel). The fact that the staff cleaning rooms are restricted in their privileges doesn't make them guests of the hotel. The toolstack domain, the hardware domain, the driver domain, the xenstore domain, and so on are all part of the host system, designed to allow you to use Xen to do the thing you actually want to do: Run guests. And it's important that we have a word that distinguishes "domains that we only care about because they make it possible to run other domains", and "domains that we actually want to run". "guest / host" is a natural terminology for these. We already have the word "domain", which includes dom0, driver domains, toolstack domains, hardware domains, as well as guest domains. We don't need "guest" to be a synonym for "domain". So... Please can someone propose wording simple, clear wording for what "guest" means. A potentially untrusted domain. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Terminology for "guest" - Was: [PATCH] docs/sphinx: Introduction
On 8/13/19 1:43 AM, George Dunlap wrote: On Aug 13, 2019, at 3:59 AM, Sarah Newman wrote: On 8/12/19 8:01 AM, Andrew Cooper wrote: On 12/08/2019 15:53, George Dunlap wrote: On 8/8/19 10:13 AM, Julien Grall wrote: Hi Jan, On 08/08/2019 10:04, Jan Beulich wrote: On 08.08.2019 10:43, Andrew Cooper wrote: On 08/08/2019 07:22, Jan Beulich wrote: On 07.08.2019 21:41, Andrew Cooper wrote: --- /dev/null +++ b/docs/glossary.rst @@ -0,0 +1,37 @@ +Glossary + + +.. Terms should appear in alphabetical order + +.. glossary:: + + control domain + A :term:`domain`, commonly dom0, with the permission and responsibility + to create and manage other domains on the system. + + domain + A domain is Xen's unit of resource ownership, and generally has at the + minimum some RAM and virtual CPUs. + + The terms :term:`domain` and :term:`guest` are commonly used + interchangeably, but they mean subtly different things. + + A guest is a single virtual machine. + + Consider the case of live migration where, for a period of time, one + guest will be comprised of two domains, while it is in transit. + + domid + The numeric identifier of a running :term:`domain`. It is unique to a + single instance of Xen, used as the identifier in various APIs, and is + typically allocated sequentially from 0. + + guest + See :term:`domain` I think you want to mention the usual distinction here: Dom0 is, while a domain, commonly not considered a guest. To be honest, I had totally forgotten about that. I guess now is the proper time to rehash it in public. I don't think the way it currently gets used has a clear or coherent set of rules, because I can't think of any to describe how it does get used. Either there are a clear and coherent (and simple!) set of rules for what we mean by "guest", at which point they can live here in the glossary, or the fuzzy way it is current used should cease. What's fuzzy about Dom0 not being a guest (due to being a part of the host instead)? Dom0 is not part of the host if you are using an hardware domain. I actually thought we renamed everything in Xen from Dom0 to hwdom to avoid the confusion. I also would rather avoid to treat "dom0" as not a guest. In normal setup this is a more privilege guest, in other setup this may just be a normal guest (think about partitioning). A literal guest is someone who doesn't live in the building (or work in the buliding, if you're in a hotel). The fact that the staff cleaning rooms are restricted in their privileges doesn't make them guests of the hotel. The toolstack domain, the hardware domain, the driver domain, the xenstore domain, and so on are all part of the host system, designed to allow you to use Xen to do the thing you actually want to do: Run guests. And it's important that we have a word that distinguishes "domains that we only care about because they make it possible to run other domains", and "domains that we actually want to run". "guest / host" is a natural terminology for these. We already have the word "domain", which includes dom0, driver domains, toolstack domains, hardware domains, as well as guest domains. We don't need "guest" to be a synonym for "domain". So... Please can someone propose wording simple, clear wording for what "guest" means. A potentially untrusted domain. But wouldn’t that include both driver domains and stubdomains? Then how about: a domain which does not provide Xen-related services. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1.5] x86/livepatch: Prevent patching with active waitqueues
On 11/5/19 11:49 AM, Andrew Cooper wrote: The safety of livepatching depends on every stack having been unwound, but there is one corner case where this is not true. The Sharing/Paging/Monitor infrastructure may use waitqueues, which copy the stack frame sideways and longjmp() to a different vcpu. This case is rare, and can be worked around by pausing the offending domain(s), waiting for their rings to drain, then performing a livepatch. In the case that there is an active waitqueue, fail the livepatch attempt with -EBUSY, which is preforable to the fireworks which occur from trying to unwind the old stack frame at a later point. Signed-off-by: Andrew Cooper --- CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall CC: Juergen Gross This fix wants backporting, and is long overdue for posting upstream. v1.5: * Send out a non-stale patch this time. --- xen/arch/arm/livepatch.c| 5 + xen/arch/x86/livepatch.c| 40 xen/common/livepatch.c | 8 xen/include/xen/livepatch.h | 1 + 4 files changed, 54 insertions(+) diff --git a/xen/arch/arm/livepatch.c b/xen/arch/arm/livepatch.c index 00c5e2bc45..915e9d926a 100644 --- a/xen/arch/arm/livepatch.c +++ b/xen/arch/arm/livepatch.c @@ -18,6 +18,11 @@ void *vmap_of_xen_text; +int arch_livepatch_safety_check(void) +{ +return 0; +} + int arch_livepatch_quiesce(void) { mfn_t text_mfn; diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index c82cf53b9e..2749cbc5cf 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -10,10 +10,50 @@ #include #include #include +#include #include #include +static bool has_active_waitqueue(const struct vm_event_domain *ved) +{ +/* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */ +return (ved && !list_head_is_null(&ved->wq.list) && +!list_empty(&ved->wq.list)); +} + +/* + * x86's implementation of waitqueue violates the livepatching safey principle + * of having unwound every CPUs stack before modifying live content. + * + * Search through every domain and check that no vCPUs have an active + * waitqueue. + */ +int arch_livepatch_safety_check(void) +{ +struct domain *d; + +for_each_domain ( d ) +{ +#ifdef CONFIG_MEM_SHARING +if ( has_active_waitqueue(d->vm_event_share) ) +goto fail; +#endif +#ifdef CONFIG_MEM_PAGING +if ( has_active_waitqueue(d->vm_event_paging) )' Is this supposed to be CONFIG_HAS_MEM_PAGING? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1.5] x86/livepatch: Prevent patching with active waitqueues
On 11/5/19 11:49 AM, Andrew Cooper wrote: The safety of livepatching depends on every stack having been unwound, but there is one corner case where this is not true. The Sharing/Paging/Monitor infrastructure may use waitqueues, which copy the stack frame sideways and longjmp() to a different vcpu. This case is rare, and can be worked around by pausing the offending domain(s), waiting for their rings to drain, then performing a livepatch. In the case that there is an active waitqueue, fail the livepatch attempt with -EBUSY, which is preforable to the fireworks which occur from trying to unwind the old stack frame at a later point. Signed-off-by: Andrew Cooper --- CC: Konrad Rzeszutek Wilk CC: Ross Lagerwall CC: Juergen Gross This fix wants backporting, and is long overdue for posting upstream. v1.5: * Send out a non-stale patch this time. --- xen/arch/arm/livepatch.c| 5 + xen/arch/x86/livepatch.c| 40 xen/common/livepatch.c | 8 xen/include/xen/livepatch.h | 1 + 4 files changed, 54 insertions(+) diff --git a/xen/arch/arm/livepatch.c b/xen/arch/arm/livepatch.c index 00c5e2bc45..915e9d926a 100644 --- a/xen/arch/arm/livepatch.c +++ b/xen/arch/arm/livepatch.c @@ -18,6 +18,11 @@ void *vmap_of_xen_text; +int arch_livepatch_safety_check(void) +{ +return 0; +} + int arch_livepatch_quiesce(void) { mfn_t text_mfn; diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c index c82cf53b9e..2749cbc5cf 100644 --- a/xen/arch/x86/livepatch.c +++ b/xen/arch/x86/livepatch.c @@ -10,10 +10,50 @@ #include #include #include +#include #include #include +static bool has_active_waitqueue(const struct vm_event_domain *ved) +{ +/* ved may be xzalloc()'d without INIT_LIST_HEAD() yet. */ +return (ved && !list_head_is_null(&ved->wq.list) && +!list_empty(&ved->wq.list)); +} + +/* + * x86's implementation of waitqueue violates the livepatching safey principle + * of having unwound every CPUs stack before modifying live content. + * + * Search through every domain and check that no vCPUs have an active + * waitqueue. + */ +int arch_livepatch_safety_check(void) +{ +struct domain *d; + +for_each_domain ( d ) +{ +#ifdef CONFIG_MEM_SHARING +if ( has_active_waitqueue(d->vm_event_share) ) +goto fail; +#endif +#ifdef CONFIG_MEM_PAGING +if ( has_active_waitqueue(d->vm_event_paging) ) +goto fail; +#endif +if ( has_active_waitqueue(d->vm_event_monitor) ) +goto fail; +} + +return 0; + + fail: +printk(XENLOG_ERR LIVEPATCH "%pd found with active waitqueue\n", d); +return -EBUSY; +} + int arch_livepatch_quiesce(void) { /* Disable WP to allow changes to read-only pages. */ diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c index 962647616a..8386e611f2 100644 --- a/xen/common/livepatch.c +++ b/xen/common/livepatch.c @@ -1060,6 +1060,14 @@ static int apply_payload(struct payload *data) unsigned int i; int rc; +rc = arch_livepatch_safety_check(); +if ( rc ) +{ +printk(XENLOG_ERR LIVEPATCH "%s: Safety checks failed: %d\n", + data->name, rc); +return rc; +} + Would it make sense to call arch_livepatch_safety_check from arch_livepatch_quiesce rather than directly, so that arch_livepatch_safety_check is also called from revert_payload? --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/entry/64: Remove %ebx handling from error_entry/exit
commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. This version applies to v4.9. From Andy Lutomirski, original author: error_entry and error_exit communicate the user vs kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, The xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained *nonzero* garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed.] [Note to stable maintainers: this should probably get applied to all kernels.] Cc: Brian Gerst Cc: Borislav Petkov Cc: Dominik Brodowski Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Boris Ostrovsky Cc: Juergen Gross Cc: xen-devel@lists.xenproject.org Cc: x...@kernel.org Cc: sta...@vger.kernel.org Cc: Andy Lutomirski Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Reported-and-tested-by: "M. Vefa Bicakci" Signed-off-by: Andy Lutomirski Signed-off-by: Sarah Newman --- arch/x86/entry/entry_64.S | 19 --- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index d58d8dc..0dab47a 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -774,7 +774,7 @@ ENTRY(\sym) call\do_sym - jmp error_exit /* %ebx: no swapgs flag */ + jmp error_exit .endif END(\sym) .endm @@ -1043,7 +1043,6 @@ END(paranoid_exit) /* * Save all registers in pt_regs, and switch gs if needed. - * Return: EBX=0: came from user mode; EBX=1: otherwise */ ENTRY(error_entry) cld @@ -1087,7 +1086,6 @@ ENTRY(error_entry) * for these here too. */ .Lerror_kernelspace: - incl%ebx leaqnative_irq_return_iret(%rip), %rcx cmpq%rcx, RIP+8(%rsp) je .Lerror_bad_iret @@ -1119,28 +1117,19 @@ ENTRY(error_entry) /* * Pretend that the exception came from user mode: set up pt_regs -* as if we faulted immediately after IRET and clear EBX so that -* error_exit knows that we will be returning to user mode. +* as if we faulted immediately after IRET. */ mov %rsp, %rdi callfixup_bad_iret mov %rax, %rsp - decl%ebx jmp .Lerror_entry_from_usermode_after_swapgs END(error_entry) - -/* - * On entry, EBX is a "return to kernel mode" flag: - * 1: already in kernel mode, don't need SWAPGS - * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode - */ ENTRY(error_exit) - movl%ebx, %eax DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF - testl %eax, %eax - jnz retint_kernel + testb $3, CS(%rsp) + jz retint_kernel jmp retint_user END(error_exit) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/entry/64: Remove %ebx handling from error_entry/exit
On 08/09/2018 05:41 AM, David Woodhouse wrote: > On Wed, 2018-08-08 at 10:35 -0700, Sarah Newman wrote: >> commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. >> >> This version applies to v4.9. > > I think you can kill the 'xorl %ebx,%ebx' from error_entry too but yes, > this does want to go to 4.9 and earlier because the 'Fixes:' tag is a > bit of a lie — the problem existed before that, at least in theory. The commit 2140a9942 "x86/entry/64: Relax pvops stub clobber specifications" was what removed the "movl %ebx, %eax" line later on originally, but it was the commit 3ac6d8c787b8 that removed the 'xorl %ebx,%ebx'. So these weren't matched. I don't know if it's a concern, but if someone had gone to the effort of backporting the original commit 3ac6d8c787b83, adding the removal of 'xorl %ebx,%ebx' to this patch would create merge conflicts. For that reason and given the line is harmless, should it be left in? > >> From Andy Lutomirski, original author: >> >> error_entry and error_exit communicate the user vs kernel status of >> the frame using %ebx. This is unnecessary -- the information is in >> regs->cs. Just use regs->cs. >> >> This makes error_entry simpler and makes error_exit more robust. >> >> It also fixes a nasty bug. Before all the Spectre nonsense, The >> xen_failsafe_callback entry point returned like this: >> >> ALLOC_PT_GPREGS_ON_STACK >> SAVE_C_REGS >> SAVE_EXTRA_REGS >> ENCODE_FRAME_POINTER >> jmp error_exit >> >> And it did not go through error_entry. This was bogus: RBX >> contained garbage, and error_exit expected a flag in RBX. >> Fortunately, it generally contained *nonzero* garbage, so the >> correct code path was used. As part of the Spectre fixes, code was >> added to clear RBX to mitigate certain speculation attacks. Now, >> depending on kernel configuration, RBX got zeroed and, when running >> some Wine workloads, the kernel crashes. This was introduced by: >> >> commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for >> exceptions/interrupts, to reduce speculation attack surface") >> >> With this patch applied, RBX is no longer needed as a flag, and the >> problem goes away. >> >> I suspect that malicious userspace could use this bug to crash the >> kernel even without the offending patch applied, though. >> >> [Historical note: I wrote this patch as a cleanup before I was aware >> of the bug it fixed.] >> >> [Note to stable maintainers: this should probably get applied to all >> kernels.] >> >> Cc: Brian Gerst >> Cc: Borislav Petkov >> Cc: Dominik Brodowski >> Cc: Ingo Molnar >> Cc: "H. Peter Anvin" >> Cc: Thomas Gleixner >> Cc: Boris Ostrovsky >> Cc: Juergen Gross >> Cc: xen-devel@lists.xenproject.org >> Cc: x...@kernel.org >> Cc: sta...@vger.kernel.org >> Cc: Andy Lutomirski >> Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for >> exceptions/interrupts, to reduce speculation attack surface") >> Reported-and-tested-by: "M. Vefa Bicakci" >> Signed-off-by: Andy Lutomirski >> Signed-off-by: Sarah Newman >> --- >> arch/x86/entry/entry_64.S | 19 --- >> 1 file changed, 4 insertions(+), 15 deletions(-) >> >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index d58d8dc..0dab47a 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -774,7 +774,7 @@ ENTRY(\sym) >> >> call\do_sym >> >> -jmp error_exit /* %ebx: no >> swapgs flag */ >> +jmp error_exit >> .endif >> END(\sym) >> .endm >> @@ -1043,7 +1043,6 @@ END(paranoid_exit) >> >> /* >> * Save all registers in pt_regs, and switch gs if needed. >> - * Return: EBX=0: came from user mode; EBX=1: otherwise >> */ >> ENTRY(error_entry) >> cld >> @@ -1087,7 +1086,6 @@ ENTRY(error_entry) >> * for these here too. >> */ >> .Lerror_kernelspace: >> -incl%ebx >> leaqnative_irq_return_iret(%rip), %rcx >> cmpq%rcx, RIP+8(%rsp) >> je .Lerror_bad_iret >> @@ -1119,28 +1117,19 @@ ENTRY(error_entry) >> >> /* >> * Pretend that the exception came from user mode: set up >> pt_regs >> - * as if we faulted immediately after IRET and clear EBX so &
[Xen-devel] [PATCH v2] x86/entry/64: Remove %ebx handling from error_entry/exit
commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. This version applies to v4.9. From Andy Lutomirski, original author: error_entry and error_exit communicate the user vs kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, The xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained *nonzero* garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed.] [Note to stable maintainers: this should probably get applied to all kernels.] Cc: Brian Gerst Cc: Borislav Petkov Cc: Dominik Brodowski Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Boris Ostrovsky Cc: Juergen Gross Cc: xen-devel@lists.xenproject.org Cc: x...@kernel.org Cc: sta...@vger.kernel.org Cc: Andy Lutomirski Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Reported-and-tested-by: "M. Vefa Bicakci" Signed-off-by: Andy Lutomirski Signed-off-by: Sarah Newman --- arch/x86/entry/entry_64.S | 20 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index d58d8dc..76c1d85e 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -774,7 +774,7 @@ ENTRY(\sym) call\do_sym - jmp error_exit /* %ebx: no swapgs flag */ + jmp error_exit .endif END(\sym) .endm @@ -1043,7 +1043,6 @@ END(paranoid_exit) /* * Save all registers in pt_regs, and switch gs if needed. - * Return: EBX=0: came from user mode; EBX=1: otherwise */ ENTRY(error_entry) cld @@ -1056,7 +1055,6 @@ ENTRY(error_entry) * the kernel CR3 here. */ SWITCH_KERNEL_CR3 - xorl%ebx, %ebx testb $3, CS+8(%rsp) jz .Lerror_kernelspace @@ -1087,7 +1085,6 @@ ENTRY(error_entry) * for these here too. */ .Lerror_kernelspace: - incl%ebx leaqnative_irq_return_iret(%rip), %rcx cmpq%rcx, RIP+8(%rsp) je .Lerror_bad_iret @@ -1119,28 +1116,19 @@ ENTRY(error_entry) /* * Pretend that the exception came from user mode: set up pt_regs -* as if we faulted immediately after IRET and clear EBX so that -* error_exit knows that we will be returning to user mode. +* as if we faulted immediately after IRET. */ mov %rsp, %rdi callfixup_bad_iret mov %rax, %rsp - decl%ebx jmp .Lerror_entry_from_usermode_after_swapgs END(error_entry) - -/* - * On entry, EBX is a "return to kernel mode" flag: - * 1: already in kernel mode, don't need SWAPGS - * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode - */ ENTRY(error_exit) - movl%ebx, %eax DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF - testl %eax, %eax - jnz retint_kernel + testb $3, CS(%rsp) + jz retint_kernel jmp retint_user END(error_exit) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 00/24] Vixen: A PV-in-HVM shim
vixen-upstream-v2 hangs for us after dumping the e820 map. We're able to build and run vixen-upstream-v1. My company needs serial input. It looks like that wasn't implemented. If so, and nobody else is working on patches to enable serial input, I believe we can come up with something in the next day or so. We'll also disable switching to the xen dom0 console at the same time. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 00/24] Vixen: A PV-in-HVM shim
On 01/09/2018 09:07 PM, Anthony Liguori wrote: > On Tue, Jan 9, 2018 at 8:46 PM, Sarah Newman wrote: >> vixen-upstream-v2 hangs for us after dumping the e820 map. We're able to >> build and run vixen-upstream-v1. > > Can give me more details about your guest config? I'm happy to take a > look and debug. The HVM pertinent items: pae = 1 nx = 0 acpi = 1 viridian = 0 xen_platform_pci = 1 apic = 1 boot = 'dc' sdl = 0 usb = 0 vnc = 0 nographic = 1 vga = "none" serial = 'pty' > >> My company needs serial input. It looks like that wasn't implemented. If so, >> and nobody else is working on patches to enable serial input, I believe >> we can come up with something in the next day or so. We'll also disable >> switching to the xen dom0 console at the same time. > > Yeah, it's not there yet. There's something in liuw's pvshim branch > that you might want to look at. Assuming console input works on that branch, it looks fairly straightforward to integrate. Thanks. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] vixen: port of shadow PV console's page for L2 DomU
The current version of vixen handles console output from the guest but not console input to the guest. This adds guest input as in 0d50a85f x86/pv-shim: shadow PV console's page for L2 DomU, but with read_smb moved up in guest_tx. Signed-off-by: Sarah Newman --- xen/arch/x86/guest/vixen.c| 42 +++ xen/drivers/char/console.c| 6 +- xen/include/asm-x86/guest/vixen.h | 1 + 3 files changed, 48 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c index 08619d1..4491d82 100644 --- a/xen/arch/x86/guest/vixen.c +++ b/xen/arch/x86/guest/vixen.c @@ -245,6 +245,48 @@ static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) vixen_upcall(smp_processor_id()); } +static void notify_guest(void) +{ +struct evtchn *chn; +chn = evtchn_from_port(hardware_domain, vixen_xencons_port); +evtchn_port_set_pending(hardware_domain, chn->notify_vcpu_id, chn); +} + +size_t vixen_guest_tx(char c) +{ +size_t sent = 0; +volatile struct xencons_interface *r = vixen_xencons_iface; +XENCONS_RING_IDX cons, prod; + +if (r == NULL) +return 0; + +cons = ACCESS_ONCE(r->in_cons); +prod = r->in_prod; +/* Update pointers before accessing the ring */ +smp_rmb(); + +ASSERT((prod - cons) <= sizeof(r->in)); + +/* Is the ring out of space? */ +if ( sizeof(r->in) - (prod - cons) == 0 ) +goto notify; + + +r->in[MASK_XENCONS_IDX(prod++, r->in)] = c; +sent++; + +/* Write to the ring before updating the pointer */ +smp_wmb(); +ACCESS_ONCE(r->in_prod) = prod; + + notify: +/* Always notify the guest: prevents receive path from getting stuck. */ +notify_guest(); + +return sent; +} + bool vixen_ring_process(uint16_t port) { volatile struct xencons_interface *r = vixen_xencons_iface; diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index a83aeb2..be5875e 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -30,6 +30,7 @@ #include /* for do_console_io */ #include #include +#include /* console: comma-separated list of console outputs. */ static char __initdata opt_console[30] = OPT_CONSOLE_STR; @@ -406,13 +407,16 @@ static void __serial_rx(char c, struct cpu_user_regs *regs) serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c; /* Always notify the guest: prevents receive path from getting stuck. */ send_global_virq(VIRQ_CONSOLE); + +if ( is_vixen()) +vixen_guest_tx(c); } static void serial_rx(char c, struct cpu_user_regs *regs) { static int switch_code_count = 0; -if ( switch_code && (c == switch_code) ) +if (!is_vixen() && switch_code && (c == switch_code) ) { /* We eat CTRL- in groups of 3 to switch console input. */ if ( ++switch_code_count == 3 ) diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h index eca263a..2e2666e 100644 --- a/xen/include/asm-x86/guest/vixen.h +++ b/xen/include/asm-x86/guest/vixen.h @@ -86,5 +86,6 @@ vixen_transform(struct domain *dom0, xen_pfn_t *pconsole_mfn, uint32_t *pconsole_evtchn); bool vixen_ring_process(uint16_t port); +size_t vixen_guest_tx(char c); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] vixen: port of shadow PV console's page for L2 DomU
The current version of vixen handles console output from the guest but not console input to the guest. This adds guest input as in 0d50a85f x86/pv-shim: shadow PV console's page for L2 DomU, but with read_smb moved up in guest_tx. Signed-off-by: Sergey Dyasli Signed-off-by: Sarah Newman --- xen/arch/x86/guest/vixen.c| 42 +++ xen/drivers/char/console.c| 6 +- xen/include/asm-x86/guest/vixen.h | 1 + 3 files changed, 48 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c index 08619d1..4491d82 100644 --- a/xen/arch/x86/guest/vixen.c +++ b/xen/arch/x86/guest/vixen.c @@ -245,6 +245,48 @@ static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) vixen_upcall(smp_processor_id()); } +static void notify_guest(void) +{ +struct evtchn *chn; +chn = evtchn_from_port(hardware_domain, vixen_xencons_port); +evtchn_port_set_pending(hardware_domain, chn->notify_vcpu_id, chn); +} + +size_t vixen_guest_tx(char c) +{ +size_t sent = 0; +volatile struct xencons_interface *r = vixen_xencons_iface; +XENCONS_RING_IDX cons, prod; + +if (r == NULL) +return 0; + +cons = ACCESS_ONCE(r->in_cons); +prod = r->in_prod; +/* Update pointers before accessing the ring */ +smp_rmb(); + +ASSERT((prod - cons) <= sizeof(r->in)); + +/* Is the ring out of space? */ +if ( sizeof(r->in) - (prod - cons) == 0 ) +goto notify; + + +r->in[MASK_XENCONS_IDX(prod++, r->in)] = c; +sent++; + +/* Write to the ring before updating the pointer */ +smp_wmb(); +ACCESS_ONCE(r->in_prod) = prod; + + notify: +/* Always notify the guest: prevents receive path from getting stuck. */ +notify_guest(); + +return sent; +} + bool vixen_ring_process(uint16_t port) { volatile struct xencons_interface *r = vixen_xencons_iface; diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index a83aeb2..be5875e 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -30,6 +30,7 @@ #include /* for do_console_io */ #include #include +#include /* console: comma-separated list of console outputs. */ static char __initdata opt_console[30] = OPT_CONSOLE_STR; @@ -406,13 +407,16 @@ static void __serial_rx(char c, struct cpu_user_regs *regs) serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c; /* Always notify the guest: prevents receive path from getting stuck. */ send_global_virq(VIRQ_CONSOLE); + +if ( is_vixen()) +vixen_guest_tx(c); } static void serial_rx(char c, struct cpu_user_regs *regs) { static int switch_code_count = 0; -if ( switch_code && (c == switch_code) ) +if (!is_vixen() && switch_code && (c == switch_code) ) { /* We eat CTRL- in groups of 3 to switch console input. */ if ( ++switch_code_count == 3 ) diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h index eca263a..2e2666e 100644 --- a/xen/include/asm-x86/guest/vixen.h +++ b/xen/include/asm-x86/guest/vixen.h @@ -86,5 +86,6 @@ vixen_transform(struct domain *dom0, xen_pfn_t *pconsole_mfn, uint32_t *pconsole_evtchn); bool vixen_ring_process(uint16_t port); +size_t vixen_guest_tx(char c); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
On 01/10/2018 08:56 AM, Sergey Dyasli wrote: > On Tue, 2018-01-09 at 09:28 -0700, Jan Beulich wrote: > On 09.01.18 at 16:43, wrote: >>> >>> On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote: >>> On 04.01.18 at 14:06, wrote: > > +size_t consoled_guest_rx(void) > +{ > +size_t recv = 0, idx = 0; > +XENCONS_RING_IDX cons, prod; > + > +if ( !cons_ring ) > +return 0; > + > +spin_lock(&rx_lock); > + > +cons = cons_ring->out_cons; > +prod = ACCESS_ONCE(cons_ring->out_prod); > +ASSERT((prod - cons) <= sizeof(cons_ring->out)); > + > +/* Is the ring empty? */ > +if ( cons == prod ) > +goto out; > + > +/* Update pointers before accessing the ring */ > +smp_rmb(); I think this need to move up ahead of the if(). In the comment perhaps s/Update/Latch/? >>> >>> The read/write memory barriers here are between read/write accesses to >>> ring->out_prod and ring->out array. So there is no need to move them. >>> (the same goes for the input ring) >> >> And there is no multiple-read issue here? > > As Andrew has kindly explained to me, there is an issue indeed. > So I moved smp_rmb() to be right after cons and prod read, and updated > the comment to say: > > "Latch pointers before accessing the ring. Included compiler barrier also > ensures that pointers are really read only once into local variables." > There is an incompatibility in this (also vixen's) serial output. The NetBSD installer includes NUL characters in its output and does not display properly due to the call to puts. putc demonstrably works, though it would be better to add a length oriented function to serial.c. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] vixen: transmit NUL characters received from guest serial port
Certain programs, such as the NetBSD installer, include NUL characters in their output. Using null-terminated strings for transmitting data from the guest to the L0 hypervisor meant the output was being corrupted. This makes only the required changes for vixen to work properly. Future work could include generally switching away from null-terminated strings within the xen console code. Signed-off-by: Sarah Newman --- xen/arch/x86/guest/vixen.c | 4 ++-- xen/drivers/char/console.c | 14 +++--- xen/drivers/char/serial.c | 35 +++ xen/include/xen/lib.h | 2 +- xen/include/xen/serial.h | 3 +++ 5 files changed, 52 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c index 9cb1a80..1b1d2e0 100644 --- a/xen/arch/x86/guest/vixen.c +++ b/xen/arch/x86/guest/vixen.c @@ -304,7 +304,7 @@ bool vixen_ring_process(uint16_t port) char ch = r->out[MASK_XENCONS_IDX(r->out_cons, r->out)]; if (n == sizeof(buffer) - 1) { buffer[n] = 0; -guest_puts(hardware_domain, buffer); +guest_putsn(hardware_domain, buffer, n); n = 0; } buffer[n++] = ch; @@ -314,7 +314,7 @@ bool vixen_ring_process(uint16_t port) if (n) { buffer[n] = 0; -guest_puts(hardware_domain, buffer); +guest_putsn(hardware_domain, buffer, n); } spin_unlock(&vixen_xencons_lock); diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index 38a5d67..ba22cdf 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -341,6 +341,14 @@ static void sercon_puts(const char *s) serial_puts(sercon_handle, s); } +static void sercon_putsn(const char *s, unsigned int count) +{ +if ( serial_steal_fn != NULL ) +(*serial_steal_fn)(s); +else +serial_putsn(sercon_handle, s, count); +} + static void dump_console_ring_key(unsigned char key) { uint32_t idx, len, sofar, c; @@ -461,7 +469,7 @@ static long guest_console_write(XEN_GUEST_HANDLE_PARAM(char) buffer, int count) /* Use direct console output as it could be interactive */ spin_lock_irq(&console_lock); -sercon_puts(kbuf); +sercon_putsn(kbuf, count); video_puts(kbuf); if ( opt_console_to_ring ) @@ -761,11 +769,11 @@ void guest_printk(const struct domain *d, const char *fmt, ...) va_end(args); } -void guest_puts(const struct domain *d, const char *kbuf) +void guest_putsn(const struct domain *d, const char *kbuf, unsigned int count) { spin_lock_irq(&console_lock); -sercon_puts(kbuf); +sercon_putsn(kbuf, count); video_puts(kbuf); if ( opt_console_to_ring ) diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c index 09a20ac..e90378a 100644 --- a/xen/drivers/char/serial.c +++ b/xen/drivers/char/serial.c @@ -223,6 +223,41 @@ void serial_putc(int handle, char c) spin_unlock_irqrestore(&port->tx_lock, flags); } +void serial_putsn(int handle, const char *s, unsigned int count) +{ +struct serial_port *port; +unsigned long flags; +char c; + +if ( handle == -1 ) +return; + +port = &com[handle & SERHND_IDX]; +if ( !port->driver || !port->driver->putc ) +return; + +spin_lock_irqsave(&port->tx_lock, flags); + +for (unsigned int i = 0; i < count; i++) +{ +c = *s++; +if ( (c == '\n') && (handle & SERHND_COOKED) ) +__serial_putc(port, '\r' | ((handle & SERHND_HI) ? 0x80 : 0x00)); + +if ( handle & SERHND_HI ) +c |= 0x80; +else if ( handle & SERHND_LO ) +c &= 0x7f; + +__serial_putc(port, c); +} + +if ( port->driver->flush ) +port->driver->flush(port); + +spin_unlock_irqrestore(&port->tx_lock, flags); +} + void serial_puts(int handle, const char *s) { struct serial_port *port; diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h index 5359fa4..baccae0 100644 --- a/xen/include/xen/lib.h +++ b/xen/include/xen/lib.h @@ -91,7 +91,7 @@ extern void printk(const char *format, ...) __attribute__ ((format (printf, 1, 2))); extern void guest_printk(const struct domain *d, const char *format, ...) __attribute__ ((format (printf, 2, 3))); -extern void guest_puts(const struct domain *d, const char *message); +extern void guest_putsn(const struct domain *d, const char *message, unsigned int count); extern void noreturn panic(const char *format, ...) __attribute__ ((format (printf, 1, 2))); extern long vm_assist(struct domain *, unsigned int cmd, unsigned int type, diff --git a/xen/include/xen/serial.h b/xen/include/xen/serial.h index 1212a12..87dbdcf 100644 --- a/xen/include/xen/serial.h +++ b/xen/include/xen/serial.h @@ -115,6
[Xen-devel] Explicit cpuid may be incompatible with vixen
We had an experience where vixen would crash with (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=] (XEN) (XEN) (XEN) Reboot in five seconds... when cpuid was set to: cpuid = [ "0x0001:ecx=x0x00xxx", "0x0007,0x00:ebx=xx0000xx0xx0xx000x0x" ] On a Xeon E5 v2 system. Since vixen is incompatible with live migration anyway we just removed the line. If there's more debugging information that would be helpful please let me know. Thanks, Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Repeated problems with network receive path in Linux
Hi, We have experienced 3 full crashes over the last 3 months in the Linux network receive path, and one additional crash of a vif queue that killed networking for a single guest but did not bring down the entire host. I am not 100% sure if it is Xen related or not, but it seems possible. Because we have only seen it on one specific host, and the problem has persisted across different pieces of hardware, I believe the cause is likely data dependent or maybe can only be triggered in a reasonable amount of time if there is large amounts of traffic exchanged between guests on the same physical server. We've observed some guests that are exchanging a large amount of traffic on-host, which is unusual for our systems. So far the Linux versions we've had problems with are 4.9.32 and 4.9.58. We are using bridged networking. The dom0 has up until today had two vcpus allocated to it. In our latest crash we had gro disabled on our physical devices but not on any vifs. Since then we have disabled gro, gso, tso on our physical devices and all of our vifs. Suggestions for anything else to try would be quite welcome. Here is the latest backtrace (module lines removed to try to avoid erroneous search results:) [5172268.346847] general protection fault: [#2] SMP [5172268.347810] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G D 4.9.58-29.el6.x86_64 #1 [5172268.347904] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 [5172268.348010] task: 8802fbf780c0 task.stack: c900400ac000 [5172268.348072] RIP: e030:[] [] skb_release_data+0x67/0xf0 [5172268.348171] RSP: e02b:c900400af348 EFLAGS: 00010206 [5172268.348233] RAX: 0030 RBX: 880327f8fc40 RCX: [5172268.348313] RDX: 0010 RSI: 006b RDI: 88021f76b000 [5172268.348392] RBP: c900400af388 R08: R09: 880260b0db80 [5172268.348470] R10: 0011 R11: R12: 88021f76b000 [5172268.348548] R13: c26009c5e4500110 R14: R15: c900400af348 [5172268.348630] FS: () GS:880301c0() knlGS:0017f980 [5172268.348720] CS: e033 DS: ES: CR0: 80050033 [5172268.348784] CR2: 7ff440dd6000 CR3: 0002bd993000 CR4: 00042660 [5172268.348863] Stack: [5172268.348894] 88024723b000 88021f76b000 88021f76b000 88021f76b000 [5172268.348980] 88021f76b000 817eb756 8802cea048c0 [5172268.349077] c900400af3a8 81799b08 c900400af450 88021f76b000 [5172268.349161] Call Trace: [5172268.349197] [] ? nf_hook_slow+0x96/0xb0 [5172268.349259] [] skb_release_all+0x28/0x30 [5172268.349329] [] __kfree_skb+0x16/0x80 [5172268.349384] [] ? ebt_in_hook+0x1f/0x24 [ebtable_filter] [5172268.349460] [] kfree_skb+0x47/0xb0 [5172268.349520] [] ? _raw_spin_unlock_irqrestore+0x16/0x20 [5172268.349596] [] nf_hook_slow+0x96/0xb0 [5172268.349662] [] __br_forward+0xd7/0x1f0 [bridge] [5172268.349765] [] ? br_flood+0x160/0x160 [bridge] [5172268.349835] [] br_forward+0xb3/0xc0 [bridge] [5172268.349904] [] br_handle_frame_finish+0x1aa/0x340 [bridge] [5172268.349985] [] ? br_handle_local_finish+0x50/0x50 [bridge] [5172268.350067] [] br_nf_hook_thresh+0x65/0xd0 [br_netfilter] [5172268.350148] [] ? poison_obj+0x33/0x50 [5172268.350215] [] ? nf_iterate+0x68/0xa0 [5172268.350269] [] br_nf_pre_routing_finish_ipv6+0x18c/0x210 [br_netfilter] [5172268.350362] [] ? br_handle_local_finish+0x50/0x50 [bridge] [5172268.350442] [] ? nf_hook_slow+0x3b/0xb0 [5172268.350506] [] ? br_dev_queue_push_xmit+0xbb/0x180 [bridge] [5172268.350586] [] br_nf_pre_routing_ipv6+0xd4/0x164 [br_netfilter] [5172268.350670] [] ? br_nf_pre_routing_finish+0x320/0x320 [br_netfilter] [5172268.350758] [] br_nf_pre_routing+0x1a7/0x370 [br_netfilter] [5172268.350846] [] ? br_flood+0x160/0x160 [bridge] [5172268.350909] [] nf_iterate+0x68/0xa0 [5172268.350968] [] nf_hook_slow+0x3b/0xb0 [5172268.351032] [] br_handle_frame+0x1bc/0x370 [bridge] [5172268.351104] [] ? br_handle_local_finish+0x50/0x50 [bridge] [5172268.351186] [] ? br_handle_frame_finish+0x340/0x340 [bridge] [5172268.351268] [] __netif_receive_skb_core+0x227/0x9e0 [5172268.351343] [] ? xen_force_evtchn_callback+0xd/0x10 [5172268.351416] [] ? check_events+0x16/0x30 [5172268.351478] [] ? poison_obj+0x33/0x50 [5172268.351539] [] ? cache_alloc_debugcheck_after+0x130/0x190 [5172268.351617] [] ? __build_skb+0x2e/0xe0 [5172268.351680] [] ? br_handle_frame_finish+0x340/0x340 [bridge] [5172268.351761] [] ? kmem_cache_alloc+0xb6/0x240 [5172268.351829] [] ? ktime_get_with_offset+0x75/0x100 [5172268.351900] [] __netif_receive_skb+0x22/0x70 [5172268.351967] [] netif_receive_skb_internal+0x33/0x90 [5172268.352040] [] napi_gro_receive+0xd2/0xf0 [5172268.352108] [] igb_clean_rx_irq+0x1e5/0x370 [igb] [5172268.352180] [] ? scsi_run_queu
Re: [Xen-devel] CPU Lockup bug with the credit2 scheduler
On 1/7/20 6:25 AM, Alastair Browne wrote: CONCLUSION So in conclusion, the tests indicate that credit2 might be unstable. For the time being, we are using credit as the chosen scheduler. We are booting the kernel with a parameter "sched=credit" to ensure that the correct scheduler is used. After the tests, we decided to stick with 4.9.0.9 kernel and 4.12 Xen for production use running credit1 as the default scheduler. One person CC'ed appears to be having the same experience, where the credit2 scheduler leads to lockups (in this case in the domU, not the dom0) under relatively heavy load. It seems possible they may have the same root cause. I don't think there are, but have there been any patches since the 4.13.0 release which might have fixed problems with credit 2 scheduler? If not, what would the next step be to isolating the problem - a debug build of Xen or something else? If there are no merged or proposed fixes soon, it may be worth considering making the credit scheduler the default again until problems with the credit2 scheduler are resolved. Thanks, Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] docs/misc: xen-command-line: fix parameters in sample serial configuration
The names of the serial parameters use hyphens, not underscores. Signed-off-by: Sarah Newman --- docs/misc/xen-command-line.pandoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 5eb3a07276..5051583a5d 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -369,8 +369,8 @@ The accepted name keywords for name=value pairs are: The following are examples of correct specifications: com1=115200,8n1,0x3f8,4 -com1=115200,8n1,0x3f8,4,reg_width=4,reg_shift=2 -com1=baud=115200,parity=n,stop_bits=1,io_base=0x3f8,reg_width=4 +com1=115200,8n1,0x3f8,4,reg-width=4,reg-shift=2 +com1=baud=115200,parity=n,stop-bits=1,io-base=0x3f8,reg-width=4 ### conring_size > `= ` -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [PATCH] xen: don't reschedule in preemption off sections
On 7/10/20 5:01 AM, Jürgen Groß wrote: On 10.07.20 13:55, Jan Beulich wrote: On 10.07.2020 12:50, Jürgen Groß wrote: On 10.07.20 11:49, Jan Beulich wrote: On 10.07.2020 09:50, Juergen Gross wrote: For support of long running hypercalls xen_maybe_preempt_hcall() is calling cond_resched() in case a hypercall marked as preemptible has been interrupted. Normally this is no problem, as only hypercalls done via some ioctl()s are marked to be preemptible. In rare cases when during such a preemptible hypercall an interrupt occurs and any softirq action is started from irq_exit(), a further hypercall issued by the softirq handler will be regarded to be preemptible, too. This might lead to rescheduling in spite of the softirq handler potentially having set preempt_disable(), leading to splats like: BUG: sleeping function called from invalid context at drivers/xen/preempt.c:37 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20775, name: xl INFO: lockdep is turned off. CPU: 1 PID: 20775 Comm: xl Tainted: G D W 5.4.46-1_prgmr_debug.el7.x86_64 #1 Call Trace: dump_stack+0x8f/0xd0 ___might_sleep.cold.76+0xb2/0x103 xen_maybe_preempt_hcall+0x48/0x70 xen_do_hypervisor_callback+0x37/0x40 RIP: e030:xen_hypercall_xen_version+0xa/0x20 Code: ... RSP: e02b:c900400dcc30 EFLAGS: 0246 RAX: 0004000d RBX: 0200 RCX: 8100122a RDX: 88812e788000 RSI: RDI: RBP: 83ee3ad0 R08: 0001 R09: 0001 R10: R11: 0246 R12: 8881824aa0b0 R13: 000865496000 R14: 000865496000 R15: 88815d04 ? xen_hypercall_xen_version+0xa/0x20 ? xen_force_evtchn_callback+0x9/0x10 ? check_events+0x12/0x20 ? xen_restore_fl_direct+0x1f/0x20 ? _raw_spin_unlock_irqrestore+0x53/0x60 ? debug_dma_sync_single_for_cpu+0x91/0xc0 ? _raw_spin_unlock_irqrestore+0x53/0x60 ? xen_swiotlb_sync_single_for_cpu+0x3d/0x140 ? mlx4_en_process_rx_cq+0x6b6/0x1110 [mlx4_en] ? mlx4_en_poll_rx_cq+0x64/0x100 [mlx4_en] ? net_rx_action+0x151/0x4a0 ? __do_softirq+0xed/0x55b ? irq_exit+0xea/0x100 ? xen_evtchn_do_upcall+0x2c/0x40 ? xen_do_hypervisor_callback+0x29/0x40 ? xen_hypercall_domctl+0xa/0x20 ? xen_hypercall_domctl+0x8/0x20 ? privcmd_ioctl+0x221/0x990 [xen_privcmd] ? do_vfs_ioctl+0xa5/0x6f0 ? ksys_ioctl+0x60/0x90 ? trace_hardirqs_off_thunk+0x1a/0x20 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x62/0x250 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix that by testing preempt_count() before calling cond_resched(). In kernel 5.8 this can't happen any more due to the entry code rework. Reported-by: Sarah Newman Fixes: 0fa2f5cb2b0ecd8 ("sched/preempt, xen: Use need_resched() instead of should_resched()") Cc: Sarah Newman Signed-off-by: Juergen Gross --- drivers/xen/preempt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c index 17240c5325a3..6ad87b5c95ed 100644 --- a/drivers/xen/preempt.c +++ b/drivers/xen/preempt.c @@ -27,7 +27,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); asmlinkage __visible void xen_maybe_preempt_hcall(void) { if (unlikely(__this_cpu_read(xen_in_preemptible_hcall) - && need_resched())) { + && need_resched() && !preempt_count())) { Doesn't this have the at least latent risk of hiding issues in other call trees (by no longer triggering logging like the one that has propmted this change)? Wouldn't it be better to save, clear, and restore the flag in one of xen_evtchn_do_upcall() or xen_do_hypervisor_callback()? First regarding "risk of hiding issues": it seems as if lots of kernels aren't even configured to trigger this logging. It would need CONFIG_DEBUG_ATOMIC_SLEEP to be enabled and at least SUSE kernels don't seem to have it on. I suspect the occasional xen_mc_flush() failures we have seen are related to this problem. And in theory saving, clearing and restoring the flag would be fine, but it can't be done in a single function with the code flow as up to 5.7. What would need to be done is to save and clear the flag in e.g. __xen_evtchn_do_upcall() and to pass it to xen_maybe_preempt_hcall() as a parameter. In xen_maybe_preempt_hcall() the passed flag value would need to be used for the decision whether to call cond_resched() and then the flag could be restored (after the cond_resched() call). I'm afraid I don't follow: If __xen_evtchn_do_upcall() cleared the flag, xen_maybe_preempt_hcall() would amount to a no-op (up and until the flag's prior value would get restored), wouldn't it? No need to pass anything into there. The problem is after __xen_evtchn_do_upcall() restoring the flag. As soon as irq_exit() is being called (either by xen_evtchn_do_upcall() or by the caller of xen_hvm_evtchn_do_upcall()) softirq handling might be executed resulting in another hypercall, which might be preempted after
Linux 5.4.46: BUG: sleeping function called from invalid context at drivers/xen/preempt.c:37
BUG: sleeping function called from invalid context at drivers/xen/preempt.c:37 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20775, name: xl INFO: lockdep is turned off. CPU: 1 PID: 20775 Comm: xl Tainted: G D W 5.4.46-1_prgmr_debug.el7.x86_64 #1 Call Trace: dump_stack+0x8f/0xd0 ___might_sleep.cold.76+0xb2/0x103 xen_maybe_preempt_hcall+0x48/0x70 xen_do_hypervisor_callback+0x37/0x40 RIP: e030:xen_hypercall_xen_version+0xa/0x20 Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc RSP: e02b:c900400dcc30 EFLAGS: 0246 RAX: 0004000d RBX: 0200 RCX: 8100122a RDX: 88812e788000 RSI: RDI: RBP: 83ee3ad0 R08: 0001 R09: 0001 R10: R11: 0246 R12: 8881824aa0b0 R13: 000865496000 R14: 000865496000 R15: 88815d04 ? xen_hypercall_xen_version+0xa/0x20 ? xen_force_evtchn_callback+0x9/0x10 ? check_events+0x12/0x20 ? xen_restore_fl_direct+0x1f/0x20 ? _raw_spin_unlock_irqrestore+0x53/0x60 ? debug_dma_sync_single_for_cpu+0x91/0xc0 ? _raw_spin_unlock_irqrestore+0x53/0x60 ? xen_swiotlb_sync_single_for_cpu+0x3d/0x140 ? mlx4_en_process_rx_cq+0x6b6/0x1110 [mlx4_en] ? mlx4_en_poll_rx_cq+0x64/0x100 [mlx4_en] ? net_rx_action+0x151/0x4a0 ? __do_softirq+0xed/0x55b ? irq_exit+0xea/0x100 ? xen_evtchn_do_upcall+0x2c/0x40 ? xen_do_hypervisor_callback+0x29/0x40 ? xen_hypercall_domctl+0xa/0x20 ? xen_hypercall_domctl+0x8/0x20 ? privcmd_ioctl+0x221/0x990 [xen_privcmd] ? do_vfs_ioctl+0xa5/0x6f0 ? ksys_ioctl+0x60/0x90 ? trace_hardirqs_off_thunk+0x1a/0x20 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x62/0x250 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
Re: [Xen-devel] Design session report: Live-Updating Xen
On 7/15/19 11:57 AM, Foerster, Leonard wrote: ... A key cornerstone for Live-update is guest transparent live migration ... -> for live migration: domid is a problem in this case -> randomize and pray does not work on smaller fleets -> this is not a problem for live-update -> BUT: as a community we shoudl make this restriction go away Andrew Cooper pointed out to me that manually assigning domain IDs is supported in much of the code already. If guest transparent live migration gets merged, we'll look at passing in a domain ID to xl, which would be good enough for us. I don't know about the other toolstacks. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Design session report: Live-Updating Xen
On 7/15/19 8:48 PM, Juergen Gross wrote: On 15.07.19 21:31, Sarah Newman wrote: On 7/15/19 11:57 AM, Foerster, Leonard wrote: ... A key cornerstone for Live-update is guest transparent live migration ... -> for live migration: domid is a problem in this case -> randomize and pray does not work on smaller fleets -> this is not a problem for live-update -> BUT: as a community we shoudl make this restriction go away Andrew Cooper pointed out to me that manually assigning domain IDs is supported in much of the code already. If guest transparent live migration gets merged, we'll look at passing in a domain ID to xl, which would be good enough for us. I don't know about the other toolstacks. The main problem is the case where on the target host the domid of the migrated domain is already in use by another domain. So you either need a domid allocator spanning all hosts or the change of domid during migration must be hidden from the guest for guest transparent migration. Yes. There are some cluster management systems which use xl rather than xapi. They could be extended to manage domain IDs if it's too difficult to allow the domain ID to change during migration. --Sarah ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[PATCH] livepatch: create-diff-object: Check that the section has a secsym
A STT_SECTION symbol is not needed if if it is not used as a relocation target. Therefore, a section, in this case a debug section, may not have a secsym associated with it. This is a livepatch backport of kpatch upstream commit [1]: create-diff-object: Check that the section has a secsym ba3defa Signed-off-by: Sarah Newman [1] https://github.com/dynup/kpatch/commit/ba3defa06073dcc69917d9df570ca4e56612 --- create-diff-object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/create-diff-object.c b/create-diff-object.c index a516670..780e6c8 100644 --- a/create-diff-object.c +++ b/create-diff-object.c @@ -1484,7 +1484,7 @@ static void kpatch_include_debug_sections(struct kpatch_elf *kelf) list_for_each_entry(sec, &kelf->sections, list) { if (is_debug_section(sec)) { sec->include = 1; - if (!is_rela_section(sec)) + if (!is_rela_section(sec) && sec->secsym) sec->secsym->include = 1; } } -- 2.17.1
[PATCH v2] livepatch: create-diff-object: Check that the section has a secsym
A STT_SECTION symbol is not needed if if it is not used as a relocation target. Therefore, a section, in this case a debug section, may not have a secsym associated with it. Origin: https://github.com/dynup/kpatch.git ba3defa06073 Signed-off-by: Sarah Newman --- Changes in v2: - commit message changed to use Origin --- create-diff-object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/create-diff-object.c b/create-diff-object.c index a516670..780e6c8 100644 --- a/create-diff-object.c +++ b/create-diff-object.c @@ -1484,7 +1484,7 @@ static void kpatch_include_debug_sections(struct kpatch_elf *kelf) list_for_each_entry(sec, &kelf->sections, list) { if (is_debug_section(sec)) { sec->include = 1; - if (!is_rela_section(sec)) + if (!is_rela_section(sec) && sec->secsym) sec->secsym->include = 1; } } -- 2.17.1
[PATCH v3] livepatch: create-diff-object: Check that the section has a secsym
A STT_SECTION symbol is not needed if if it is not used as a relocation target. Therefore, a section, in this case a debug section, may not have a secsym associated with it. Signed-off-by: Bill Wendling Origin: https://github.com/dynup/kpatch.git ba3defa06073 Signed-off-by: Sarah Newman Reviewed-by: Ross Lagerwall --- Changes in v3: - add reviewed-by given to v1 of this patch - restored tag from original commit per sending-patches.pandoc Changes in v2: - commit message changed to use Origin --- create-diff-object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/create-diff-object.c b/create-diff-object.c index a516670..780e6c8 100644 --- a/create-diff-object.c +++ b/create-diff-object.c @@ -1484,7 +1484,7 @@ static void kpatch_include_debug_sections(struct kpatch_elf *kelf) list_for_each_entry(sec, &kelf->sections, list) { if (is_debug_section(sec)) { sec->include = 1; - if (!is_rela_section(sec)) + if (!is_rela_section(sec) && sec->secsym) sec->secsym->include = 1; } } -- 2.17.1
Re: [PATCH v3] livepatch: create-diff-object: Check that the section has a secsym
On 7/25/22 23:25, Jan Beulich wrote: On 25.07.2022 19:13, Sarah Newman wrote: A STT_SECTION symbol is not needed if if it is not used as a relocation target. Therefore, a section, in this case a debug section, may not have a secsym associated with it. Signed-off-by: Bill Wendling Hmm - this wasn't here before. Does this then suggest the patch also wants to be marked From: Bill? I don't know what the etiquette is here since this was a commit originally committed to kpatch, I just added that back because the xen patch submission guidelines said to preserve original tags. Origin: https://github.com/dynup/kpatch.git ba3defa06073 Signed-off-by: Sarah Newman Reviewed-by: Ross Lagerwall Sigh. I had given R-b on v1 as well. Actually I had meant to commit this yesterday (with all adjustments made), but as it turns out committers can't commit to that tree. So it'll be up to Ross or Konrad to actually take care of this. Jan My apologies, I simply missed that. I am doing this in my spare time. This is the first time I've gone through this process in a couple of years and have only done it a few times total. Thanks, Sarah