On Thu, Mar 16, 2017 at 04:54:06PM +0100, Radim Krčmář wrote: > 2017-03-16 11:44-0400, Gabriel L. Somlo: > > On Thu, Mar 16, 2017 at 03:08:07PM +0100, Radim Krčmář wrote: > >> 2017-03-16 09:24-0400, Gabriel L. Somlo: > >> > On Thu, Mar 16, 2017 at 01:41:28AM +0200, Michael S. Tsirkin wrote: > >> > > On Wed, Mar 15, 2017 at 07:35:34PM -0400, Gabriel L. Somlo wrote: > >> > > > On Wed, Mar 15, 2017 at 11:22:18PM +0200, Michael S. Tsirkin wrote: > >> > > > > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a > >> > > > > problem: > >> > > > > unless explicitly provided with kernel command line argument > >> > > > > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT > >> > > > > availability, > >> > > > > without checking CPUID. > >> > > > > > >> > > > > We currently emulate that as a NOP but on VMX we can do better: let > >> > > > > guest stop the CPU until timer, IPI or memory change. CPU will be > >> > > > > busy > >> > > > > but that isn't any worse than a NOP emulation. > >> > > > > > >> > > > > Note that mwait within guests is not the same as on real hardware > >> > > > > because halt causes an exit while mwait doesn't. For this reason > >> > > > > it > >> > > > > might not be a good idea to use the regular MWAIT flag in CPUID to > >> > > > > signal this capability. Add a flag in the hypervisor leaf instead. > >> > > > > > >> > > > > Additionally, we add a capability for QEMU - e.g. if it knows > >> > > > > there's an > >> > > > > isolated CPU dedicated for the VCPU it can set the standard MWAIT > >> > > > > flag > >> > > > > to improve guest behaviour. > >> > > > > >> > > > Same behavior (on the mac pro 1,1 running F22 with custom-compiled > >> > > > kernel from kvm git master, plus this patch on top). > >> > > > > >> > > > The OS X 10.7 kernel hangs (or at least progresses extremely slowly) > >> > > > on boot, does not bring up guest graphical interface within the first > >> > > > 10 minutes that I waited for it. That, in contrast with the default > >> > > > nop-based emulation where the guest comes up within 30 seconds. > >> > > > >> > > > >> > > Thanks a lot, meanwhile I'll try to write a unit-test and experiment > >> > > with various behaviours. > >> > > > >> > > > I will run another round of tests on a newer Mac (4-year-old macbook > >> > > > air) and report back tomorrow. > >> > > > > >> > > > Going off on a tangent, why would encouraging otherwise well-behaved > >> > > > guests (like linux ones, for example) to use MWAIT be desirable to > >> > > > begin with ? Is it a matter of minimizing the overhead associated > >> > > > with > >> > > > exiting and re-entering L1 ? Because if so, AFAIR staying inside L1 > >> > > > and > >> > > > running guest-mode MWAIT in a tight loop will actually waste the host > >> > > > CPU without the opportunity to yield to some other L0 thread. Sorry > >> > > > if > >> > > > I fell into the middle of an ongoing conversation on this and missed > >> > > > most of the relevant context, in which case please feel free to > >> > > > ignore > >> > > > me... :) > >> > > > > >> > > > Thanks, > >> > > > --G > >> > > > >> > > It's just some experiments I'm running, I'm not ready to describe it > >> > > yet. I thought this part might be useful to at least some guests, so > >> > > trying to upstream it right now. > >> > > >> > OK, so on a macbook air running F25 and the latest kvm git master plus > >> > your v5 patch (4.11.0-rc2+), things appear to work. > >> > > >> > host-side cpuid output: > >> > eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x021120 > >> > > >> > guest-side cpuid output: > >> > eax=00000000 ebx=00000000 ecx=0x000003 edx=00000000 > >> > > >> > processor : 3 > >> > vendor_id : GenuineIntel > >> > cpu family : 6 > >> > model : 42 > >> > model name : Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz > >> > stepping : 7 > >> > microcode : 0x29 > >> > cpu MHz : 1157.849 > >> > cache size : 4096 KB > >> > physical id : 0 > >> > siblings : 4 > >> > core id : 1 > >> > cpu cores : 2 > >> > apicid : 3 > >> > initial apicid : 3 > >> > fpu : yes > >> > fpu_exception : yes > >> > cpuid level : 13 > >> > wp : yes > >> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > >> > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > >> > syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl > >> > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor > >> > ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic > >> > popcnt tsc_deadline_timer aes xsave avx lahf_lm tpr_shadow vnmi > >> > flexpriority ept vpid xsaveopt dtherm ida arat pln pts > >> > bugs : > >> > bogomips : 3604.68 > >> > clflush size : 64 > >> > cache_alignment : 64 > >> > address sizes : 36 bits physical, 48 bits virtual > >> > power management: > >> > > >> > After studying your patch a bit more carefully (sorry, it's crazy > >> > around here right now :) ) I realized you're simply trying to > >> > (selectively) decide when to exit L1 and emulate as NOP vs. when to > >> > just allow L1 to execute MONITOR & MWAIT natively. > >> > > >> > Is that right ? Because if so, the issues I saw on my MacPro1,1 are > >> > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT > >> > natively was one of the options Alex Graf and Rene Rebe used back in > >> > the very early days of OS X on QEMU, at the time I got involved with > >> > that project. Here's part of an out of tree patch against 3.4 which did > >> > just that, and worked as far as I remember on *any* MWAIT capable > >> > intel chip I had access to back in 2010: > >> > > >> > ############################################################################## > >> > # 99-mwait.patch.kvm-kmod (Rene Rebe <r...@exactcode.de>) 2010-04-27 > >> > ############################################################################## > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c > >> > linux-3.4-mac/arch/x86/kvm/cpuid.c > >> > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 > >> > -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 > >> > -0400 > >> > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid > >> > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) | > >> > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp | > >> > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW); > >> > /* cpuid 1.ecx */ > >> > const u32 kvm_supported_word4_x86_features = > >> > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | > >> > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ > >> > | > >> > 0 /* DS-CPL, VMX, SMX, EST */ | > >> > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* > >> > Reserved */ | > >> > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | > >> > 0 /* Reserved, DCA */ | F(XMM4_1) | > >> > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c > >> > linux-3.4-mac/arch/x86/kvm/svm.c > >> > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 > >> > -0400 > >> > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s > >> > set_intercept(svm, INTERCEPT_VMSAVE); > >> > set_intercept(svm, INTERCEPT_STGI); > >> > set_intercept(svm, INTERCEPT_CLGI); > >> > set_intercept(svm, INTERCEPT_SKINIT); > >> > set_intercept(svm, INTERCEPT_WBINVD); > >> > - set_intercept(svm, INTERCEPT_MONITOR); > >> > - set_intercept(svm, INTERCEPT_MWAIT); > >> > set_intercept(svm, INTERCEPT_XSETBV); > >> > > >> > control->iopm_base_pa = iopm_base; > >> > control->msrpm_base_pa = __pa(svm->msrpm); > >> > control->int_ctl = V_INTR_MASKING_MASK; > >> > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c > >> > linux-3.4-mac/arch/x86/kvm/vmx.c > >> > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400 > >> > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 > >> > -0400 > >> > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls > >> > nested_vmx_procbased_ctls_low, > >> > nested_vmx_procbased_ctls_high); > >> > nested_vmx_procbased_ctls_low = 0; > >> > nested_vmx_procbased_ctls_high &= > >> > CPU_BASED_VIRTUAL_INTR_PENDING | > >> > CPU_BASED_USE_TSC_OFFSETING | > >> > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING | > >> > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING | > >> > + CPU_BASED_CR3_LOAD_EXITING | > >> > CPU_BASED_CR3_STORE_EXITING | > >> > #ifdef CONFIG_X86_64 > >> > CPU_BASED_CR8_LOAD_EXITING | > >> > CPU_BASED_CR8_STORE_EXITING | > >> > #endif > >> > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | > >> > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru > >> > CPU_BASED_CR3_LOAD_EXITING | > >> > CPU_BASED_CR3_STORE_EXITING | > >> > CPU_BASED_USE_IO_BITMAPS | > >> > CPU_BASED_MOV_DR_EXITING | > >> > CPU_BASED_USE_TSC_OFFSETING | > >> > - CPU_BASED_MWAIT_EXITING | > >> > - CPU_BASED_MONITOR_EXITING | > >> > CPU_BASED_INVLPG_EXITING | > >> > CPU_BASED_RDPMC_EXITING; > >> > > >> > opt = CPU_BASED_TPR_SHADOW | > >> > CPU_BASED_USE_MSR_BITMAPS | > >> > > >> > If all you're trying to do is (selectively) revert to this behavior, > >> > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly > >> > confused at this point :) > >> > > >> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP, > >> > didn't power down the physical CPU, just immediately moved on to the > >> > next instruction. As such, there was no power saving and no > >> > opportunity to yield to another L0 thread either, unlike with NOP > >> > emulation at L0. > >> > > >> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now > >> > doing something smarter than just acting as a guest-mode NOP) ? > >> > >> Probably, MWAIT in new intel chips enters power saving mode normally. > >> > >> If hardware-executed MWAIT acted as a NOP in your old chip, then that > >> shouldn't be a problem either ... Maybe OS X gets confused into doing > >> something really dumb because we do not expose the MONITOR/MWAIT feature > >> bit correctly. > >> > >> Can you try this QEMU patch on the old hardware? > >> > >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c > >> index 7aa762245a54..4b112e12188a 100644 > >> --- a/target/i386/cpu.c > >> +++ b/target/i386/cpu.c > >> @@ -2764,10 +2764,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t > >> index, uint32_t count, > >> break; > >> case 5: > >> /* mwait info: needed for Core compatibility */ > >> - *eax = 0; /* Smallest monitor-line size in bytes */ > >> - *ebx = 0; /* Largest monitor-line size in bytes */ > >> - *ecx = CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > >> - *edx = 0; > >> + host_cpuid(index, 0, eax, ebx, ecx, edx); > >> break; > >> case 6: > >> /* Thermal and Power Leaf */ > >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c > >> index 55865dbee0aa..1eb78291b093 100644 > >> --- a/target/i386/kvm.c > >> +++ b/target/i386/kvm.c > >> @@ -360,6 +360,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, > >> uint32_t function, > >> if (!kvm_irqchip_in_kernel()) { > >> ret &= ~CPUID_EXT_X2APIC; > >> } > >> + ret |= CPUID_EXT_MONITOR; > >> } else if (function == 6 && reg == R_EAX) { > >> ret |= CPUID_6_EAX_ARAT; /* safe to allow because of emulated > >> APIC */ > >> } else if (function == 7 && index == 0 && reg == R_EBX) { > >> > >> > >> Thanks. > > > > No change, still hangs on boot. > > Hm, also with '-cpu host'? > (I forgot that the CPUID_EXT_MONITOR isn't visible in the guest > otherwise ...)
Yeah, managed to get it started with '-cpu host', but same behavior. Maybe that version of Xeon really was braindamaged in some way, and never would have worked with L1 MWAIT regardless. I only ever used that machine after the emulate-as-nop patch made it into KVM (commit 87c0057), so I honestly can't say whether it ever worked with MWAIT run natively at L1... Thanks, --Gabriel