Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 09.11.17 at 15:16, wrote: > On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote: >> Tasklets already take care of this by >> calling sync_local_execstate() before calling the handler. But >> for softirqs this isn't really an option; I'm surprised to see that >> tasklet code does this indep

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 09.11.17 at 15:16, wrote: > Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec- > trace (which is what I wanted to do in my email already)? > > pCPU1 > = > current == vCPU1 > context_switch(next == idle) > !! __context_switch() is skipped > anything_that_uses_or_touch

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Dario Faggioli
On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote: > > > > On 09.11.17 at 12:01, wrote: > > > > pCPU1 > > = > > current == vCPU1 > > context_switch(next == idle) > > !! __context_switch() is skipped > > vcpu_migrate(vCPU1) > > anything_that_uses_or_touches_context() > > > > So, it must be

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 09.11.17 at 12:01, wrote: > Anyway, as I was trying to explain replaying to Jan, although in this > situation the issue manifests as a consequence of vCPU migration, I > think it is indeed more general, as in, without even the need to > consider a second pCPU: > > pCPU1 > = > current =

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 09.11.17 at 11:36, wrote: > Well, I'm afraid I only see two solutions: > 1) we get rid of lazy context switch; > 2) whatever it is that is happening at point c above, it needs to be >aware that we use lazy context switch, and make sure to sync the >context before playing with or a

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Dario Faggioli
On Thu, 2017-11-09 at 10:36 +, Sergey Dyasli wrote: > On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote: > > > > > On 09.11.17 at 10:54, wrote: > > > > > > On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote: > > > > Perhaps I should improve my diagram: > > > > > > > > pCPU1: vCPUx of

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Dario Faggioli
On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote: > > > > On 09.11.17 at 10:54, wrote: > > On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote: > > > Perhaps I should improve my diagram: > > > > > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle > > > context > > > -> RCU

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Dario Faggioli
On Thu, 2017-11-09 at 03:05 -0700, Jan Beulich wrote: > > > > On 07.11.17 at 16:52, wrote: > > > > There is one things that I'm worrying about with this approach: > > > > At this place we just sync the idle context because we know that we > > are > > going to deal with VMCS later. But what about

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Sergey Dyasli
On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote: > > > > On 09.11.17 at 10:54, wrote: > > > > On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote: > > > Perhaps I should improve my diagram: > > > > > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle > > > context > > > ->

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 09.11.17 at 10:54, wrote: > On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote: >> Perhaps I should improve my diagram: >> >> pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle >> context >> -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) -> >> vmx_vcpu_disable_pml() -

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Jan Beulich
>>> On 07.11.17 at 16:52, wrote: > There is one things that I'm worrying about with this approach: > > At this place we just sync the idle context because we know that we are > going to deal with VMCS later. But what about other potential cases > (perhaps some softirqs) in which we are accessing

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-09 Thread Dario Faggioli
On Tue, 2017-11-07 at 14:24 +, Igor Druzhinin wrote: > Perhaps I should improve my diagram: > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle > context > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) -> > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at thi

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Jan Beulich
>>> On 07.11.17 at 16:52, wrote: > On 07/11/17 14:55, Jan Beulich wrote: > On 07.11.17 at 15:24, wrote: >>> On 07/11/17 08:07, Jan Beulich wrote: --- unstable.orig/xen/arch/x86/domain.c +++ unstable/xen/arch/x86/domain.c @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Igor Druzhinin
On 07/11/17 14:55, Jan Beulich wrote: On 07.11.17 at 15:24, wrote: >> On 07/11/17 08:07, Jan Beulich wrote: >>> --- unstable.orig/xen/arch/x86/domain.c >>> +++ unstable/xen/arch/x86/domain.c >>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) >>> >>> void vcpu_destroy(struct vcpu

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Jan Beulich
>>> On 07.11.17 at 09:07, wrote: On 02.11.17 at 20:46, wrote: >>> Any ideas about the root cause of the fault and suggestions how to >>> reproduce > it >>> would be welcome. Does this crash really has something to do with PML? I > doubt >>> because the original environment may hardly be c

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Jan Beulich
>>> On 07.11.17 at 15:24, wrote: > On 07/11/17 08:07, Jan Beulich wrote: >> --- unstable.orig/xen/arch/x86/domain.c >> +++ unstable/xen/arch/x86/domain.c >> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v) >> >> void vcpu_destroy(struct vcpu *v) >> { >> +/* >> + * Flush all sta

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Igor Druzhinin
On 07/11/17 08:07, Jan Beulich wrote: On 02.11.17 at 20:46, wrote: >>> Any ideas about the root cause of the fault and suggestions how to >>> reproduce it >>> would be welcome. Does this crash really has something to do with PML? I >>> doubt >>> because the original environment may hardly b

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Jan Beulich
>>> On 02.11.17 at 20:46, wrote: >> Any ideas about the root cause of the fault and suggestions how to reproduce >> it >> would be welcome. Does this crash really has something to do with PML? I >> doubt >> because the original environment may hardly be called PML-heavy. Well, PML-heaviness doe

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-02 Thread Igor Druzhinin
On 27/10/17 18:42, Igor Druzhinin wrote: > On 16/02/17 11:15, Jan Beulich wrote: >> When __context_switch() is being bypassed during original context >> switch handling, the vCPU "owning" the VMCS partially loses control of >> it: It will appear non-running to remote CPUs, and hence their attempt >

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-10-27 Thread Igor Druzhinin
On 16/02/17 11:15, Jan Beulich wrote: > When __context_switch() is being bypassed during original context > switch handling, the vCPU "owning" the VMCS partially loses control of > it: It will appear non-running to remote CPUs, and hence their attempt > to pause the owning vCPU will have no effect

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-17 Thread Jan Beulich
>>> On 17.02.17 at 09:40, wrote: > On Thu, 2017-02-16 at 04:15 -0700, Jan Beulich wrote: >> When __context_switch() is being bypassed during original context >> switch handling, the vCPU "owning" the VMCS partially loses control of >> it: It will appear non-running to remote CPUs, and hence their

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-17 Thread Sergey Dyasli
On Thu, 2017-02-16 at 04:15 -0700, Jan Beulich wrote: > When __context_switch() is being bypassed during original context > switch handling, the vCPU "owning" the VMCS partially loses control of > it: It will appear non-running to remote CPUs, and hence their attempt > to pause the owning vCPU will

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-16 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Thursday, February 16, 2017 8:36 PM > > >>> On 16.02.17 at 13:27, wrote: > > On 16/02/17 11:15, Jan Beulich wrote: > >> When __context_switch() is being bypassed during original context > >> switch handling, the vCPU "owning" the VMCS partial

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-16 Thread Jan Beulich
>>> On 16.02.17 at 13:27, wrote: > On 16/02/17 11:15, Jan Beulich wrote: >> When __context_switch() is being bypassed during original context >> switch handling, the vCPU "owning" the VMCS partially loses control of >> it: It will appear non-running to remote CPUs, and hence their attempt >> to pa

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-16 Thread Andrew Cooper
On 16/02/17 11:15, Jan Beulich wrote: > When __context_switch() is being bypassed during original context > switch handling, the vCPU "owning" the VMCS partially loses control of > it: It will appear non-running to remote CPUs, and hence their attempt > to pause the owning vCPU will have no effect

[Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-02-16 Thread Jan Beulich
When __context_switch() is being bypassed during original context switch handling, the vCPU "owning" the VMCS partially loses control of it: It will appear non-running to remote CPUs, and hence their attempt to pause the owning vCPU will have no effect on it (as it already looks to be paused). At t