Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?

Gleb Natapov Mon, 18 Nov 2013 11:11:51 -0800

On Mon, Nov 18, 2013 at 09:04:51PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 05, 2013 at 01:20:10PM +0200, Gleb Natapov wrote:
> > On Tue, Nov 05, 2013 at 12:22:49PM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 05, 2013 at 12:18:57PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 04, 2013 at 10:44:43PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Nov 04, 2013 at 10:33:57PM +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 04, 2013 at 10:13:39PM +0200, Gleb Natapov wrote:
> > > > > > > On Mon, Nov 04, 2013 at 10:11:33PM +0200, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Thu, Oct 31, 2013 at 09:48:08AM +0200, Gleb Natapov wrote:
> > > > > > > > > On Thu, Oct 31, 2013 at 02:21:46AM +0200, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > > > commit 8bf00a529967dafbbb210b377c38a15834d1e979:
> > > > > > > > > > "    KVM: VMX: add support for switching of 
> > > > > > > > > > PERF_GLOBAL_CTRL " was
> > > > > > > > > > as far as I can tell supposed to bring about performance 
> > > > > > > > > > improvement
> > > > > > > > > > on hardware that supports it?
> > > > > > > > > No, it (and commits after it) supposed to fix a bug which it 
> > > > > > > > > did.
> > > > > > > > > 
> > > > > > > > > > Instead it seems to make the typical case (not running guest
> > > > > > > > > > under perf) a bit slower than it used to be.
> > > > > > > > > > the cost of VMexit goes up by about 50 cycles
> > > > > > > > > > on sandy bridge where the optimization in question
> > > > > > > > > > actually is activated.
> > > > > > > > > >
> > > > > > > > > You seams to be confused. 8bf00a529967dafbbb210 adds support 
> > > > > > > > > for special
> > > > > > > > > PERF_GLOBAL_CTRL switching, but does not add code to switch 
> > > > > > > > > anything,
> > > > > > > > > so the commit itself is a nop.
> > > > > > > > 
> > > > > > > > It does add code to add_atomic_switch_msr.
> > > > > > > > 
> > > > > > > So what? You do not read what I wrote.
> > > > > > 
> > > > > > 
> > > > > > It's simple: if I revert 8bf00a529967dafbbb210 then exit latency
> > > > > > is reduced.
> > > > > > You seem to tell me it should be a nop, but in practice it isn't.
> > > > > > 
> > > > > 
> > > > > No, if you read below I am saying that it looks like you are claiming 
> > > > > that
> > > > > generic msr switch mechanism is faster and I am not buying that. If 
> > > > > you
> > > > > believe this to be the case ask Intel for explanation. Your claim 
> > > > > about
> > > > > "not running guest under perf" is even stranger since in this case no 
> > > > > msr
> > > > > switch should happen regardless of the aforementioned commit (unless 
> > > > > guest
> > > > > or host runs nmi watchdog, but then switch will happen no matter if 
> > > > > perf
> > > > > is running, so again not running guest under perf" does not make 
> > > > > sense).
> > > > > So, in short, you do not really know where the slow down is coming
> > > > > from.
> > > > 
> > > > That's true.
> > > > 
> > > Then dig dipper.
> > > 
> > So quick and dirty patch to not needlessly write into VM_ENTRY_CONTROLS
> > when no PERF_GLOBAL_CTRL switching needed removes all the overhead. But we
> > probably need more generic code to shadow entire 
> > VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS.
> > 
> 
> Do you plan to complete this? Merge as is?
I have a patch to shadow VM_ENTRY_CONTROLS/VM_EXIT_CONTROLS now, will
post it shortly, but it will have to wait for the end of a merge window
anyway.


> You said this saves about 50 cycles per exit for you,
> did you not?
Let's not exaggerate :) May be 20 after removing all the noise.

> 
> 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index e293a62..be64221 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -413,6 +413,7 @@ struct vcpu_vmx {
> >     struct shared_msr_entry *guest_msrs;
> >     int                   nmsrs;
> >     int                   save_nmsrs;
> > +   bool                  core_perf_global_ctrl;
> >     unsigned long         host_idt_base;
> >  #ifdef CONFIG_X86_64
> >     u64                   msr_host_kernel_gs_base;
> > @@ -1432,9 +1433,12 @@ static void clear_atomic_switch_msr(struct vcpu_vmx 
> > *vmx, unsigned msr)
> >             break;
> >     case MSR_CORE_PERF_GLOBAL_CTRL:
> >             if (cpu_has_load_perf_global_ctrl) {
> > +                   if (vmx->core_perf_global_ctrl) {
> >                     clear_atomic_switch_msr_special(
> >                                     VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
> >                                     VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
> > +                           vmx->core_perf_global_ctrl = false;
> > +                   }
> >                     return;
> >             }
> >             break;
> > @@ -1488,6 +1492,7 @@ static void add_atomic_switch_msr(struct vcpu_vmx 
> > *vmx, unsigned msr,
> >                                     GUEST_IA32_PERF_GLOBAL_CTRL,
> >                                     HOST_IA32_PERF_GLOBAL_CTRL,
> >                                     guest_val, host_val);
> > +                   vmx->core_perf_global_ctrl = true;
> >                     return;
> >             }
> >             break;
> > --
> >                     Gleb.

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 8bf00a529967dafbbb210b377c38a15834d1e979 - performance regression?

Reply via email to