Kechen Lu <kech...@nvidia.com> writes: > Hi Vitaly and Paolo, > > Sorry for the delay in response, finally got chance to access a machine with > AVIC, and was able to test out the patch and reconfirm through some > benchmarks and tests again today:) > > In summary, this patch works well and resolves the issues on clocksource > caused high port I/O vmexits. With AVIC=1 && stimer/synic=1, > > 1. CPU intensive workload CPU-z shows SingleThread score 15% improvement > 382.1=> 441.7, > > 2. disk I/O intensive workload Passmark Disk Test gives 4% improvement > 12706=> 13265, > > 3. Vmexits pattern of 30s record while running cpu workload Geekbench in > guest showing dramatic 90.7% decrease on port IO vmexits, so as the HLT and > NPF vmexits, when we get stimer benefit plus AVIC. Details as below: > > AVIC=1 && stimer/synic=0 && vapic=0: > > VM-EXIT Samples Samples% Time% Min Time Max Time > Avg time > > io 344654 68.29% 1.10% 0.67us 2132.72us > 7.01us ( +- 0.19% ) > hlt 114046 22.60% 98.85% 0.42us 16666.32us > 1903.26us ( +- 0.66% ) > avic_incomplete_ipi 19679 3.90% 0.03% 0.38us 22.67us > 3.66us ( +- 0.71% ) > npf 8186 1.62% 0.01% 0.37us 235.76us > 1.46us ( +- 4.20% ) > ........ > > > AVIC=1 && stimer/synic=1 && vapic=0: > > VM-EXIT Samples Samples% Time% Min Time Max Time > Avg time > > io 31995 38.61% 0.10% 2.79us 65.83us > 6.70us ( +- 0.35% ) > hlt 22915 27.65% 99.88% 0.42us 15959.14us > 9535.38us ( +- 0.50% ) > avic_incomplete_ipi 8271 9.98% 0.01% 0.39us 79.03us > 3.58us ( +- 1.23% ) > npf 1232 1.49% 0.00% 0.36us 100.25us > 2.58us ( +- 6.98% ) > .......... > > > While testing, I also found out hv-vapic should be disabled as well to > make AVIC fully functional, otherwise it shows high vmexits due to MSR > writes which seems to be due to increased access to HV_X64_MSR_EOI > and HV_X64_MSR_ICR. This makes sense to me, since AVIC conflicts with > PV EOI/ICR accesses. So far I think AVIC=1 && hv-vapic=0 && > stimer/synic=1 combination gives us the best performance. However, > AVIC=1 && hv-vapic=0 && stimer/synic=1 is really unstable, and > sometimes would lead to boot. Wanted to understand if instabilities > with APICv/AVIC is a known bug/issue in upstream? Attached the > reproducible kernel warning in the bottom.
Now it's my turn to apologize for the delayed reply :-) I think it's our fault, BIT(3) in HYPERV_CPUID_ENLIGHTMENT_INFO is HV_X64_APIC_ACCESS_RECOMMENDED which can be deciphered as "Recommend using MSRs for accessing APIC registers EOI, ICR and TPR rather than their memory-mapped counterparts" And we shouldn't be setting it with AVIC. The following hack is supposed to help: diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c8f2592ccc99..66ee85a83e9a 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -145,6 +145,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT); } + + /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */ + best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0); + if (best) { + best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED; + best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED; + } } EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime); (we'll need to find a proper way to set these settings in QEMU). Could you give it a spin? ("AVIC=1 && hv-vapic=1 && stimer/synic=1" configuration) > > In all, AVIC=1 && hv-vapic=1 && stimer/synic=1 could work stably now and > still produce great benefits on vmexits optimization. Thanks all you folks > help so much, hope the patch in kernel and bit expose patch in QEMU could get > into upstream soon along with fixing the instabilities. > > Best Regards, > Kechen > > --------------------------------------------------------------------------------------- > [ 7962.437584] ------------[ cut here ]------------ > [ 7962.437586] Invalid IPI target: index=2, vcpu=0, icr=0x4000000:0x82f > [ 7962.437603] WARNING: CPU: 4 PID: 7109 at arch/x86/kvm/svm/avic.c:349 > avic_incomplete_ipi_interception+0x1ff/0x240 [kvm_amd] > [ 7962.437604] Modules linked in: kvm_amd ccp kvm msr nf_tables nfnetlink > bridge stp llc amd64_edac_mod edac_mce_amd nls_iso8859_1 amd_energy > crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd > glue_helper snd_hda_codec_hdmi rapl snd_hda_intel snd_intel_dspcfg wmi_bmof > snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep > snd_seq_midi snd_seq_midi_event snd_rawmidi efi_pstore joydev mc input_leds > snd_seq snd_pcm snd_seq_device snd_timer snd soundcore k10temp mac_hid > sch_fq_codel lm92 parport_pc ppdev lp parport ip_tables x_tables autofs4 iavf > hid_generic usbhid hid nvme crc32_pclmul i40e ahci nvme_core xhci_pci libahci > xhci_pci_renesas i2c_piix4 atlantic macsec wmi [last unloaded: ccp] > [ 7962.437630] CPU: 4 PID: 7109 Comm: CPU 0/KVM Tainted: P W OE > 5.8.0-41-generic #46 > [ 7962.437633] RIP: 0010:avic_incomplete_ipi_interception+0x1ff/0x240 > [kvm_amd] No, this is not somthing I'm aware of. Do you know if it reproduces on the latest upstream? > [ 7962.437635] Code: 9a 00 00 00 0f 85 2b ff ff ff 41 8b 56 24 8b 4d c8 45 89 > e0 44 89 ee 48 c7 c7 a8 34 50 c0 c6 05 b2 9a 00 00 01 e8 d6 cc 3a fb <0f> 0b > e9 04 ff ff ff 48 8b 5d c0 8b 55 c8 be 10 03 00 00 48 89 df > [ 7962.437636] RSP: 0018:ffffa7894f9bfcc0 EFLAGS: 00010282 > [ 7962.437637] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > ffff99347f118cd8 > [ 7962.437637] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: > ffff99347f118cd0 > [ 7962.437638] RBP: ffffa7894f9bfd18 R08: 0000000000000004 R09: > 0000000000000831 > [ 7962.437638] R10: 0000000000000000 R11: 0000000000000001 R12: > 040000000000082f > [ 7962.437639] R13: 0000000000000002 R14: ffff993345653448 R15: > 0000000000000002 > [ 7962.437640] FS: 0000000000000000(0053) GS:ffff99347f100000(002b) > knlGS:fffff80470728000 > [ 7962.437640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7962.437641] CR2: ffff8006ace2b000 CR3: 0000000febd88000 CR4: > 0000000000340ee0 > [ 7962.437641] Call Trace: > [ 7962.437646] handle_exit+0x134/0x420 [kvm_amd] > [ 7962.437661] ? kvm_set_cr8+0x22/0x40 [kvm] > [ 7962.437674] vcpu_enter_guest+0x862/0xd90 [kvm] > [ 7962.437687] vcpu_run+0x76/0x240 [kvm] > [ 7962.437699] kvm_arch_vcpu_ioctl_run+0x9f/0x2b0 [kvm] > [ 7962.437711] kvm_vcpu_ioctl+0x247/0x600 [kvm] > [ 7962.437714] ksys_ioctl+0x8e/0xc0 > [ 7962.437715] __x64_sys_ioctl+0x1a/0x20 > [ 7962.437717] do_syscall_64+0x49/0xc0 > [ 7962.437719] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 7962.437720] RIP: 0033:0x7f4c09b1131b > [ 7962.437721] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff > 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48 > [ 7962.437721] RSP: 002b:00007f4bedffa4a8 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 7962.437722] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: > 00007f4c09b1131b > [ 7962.437723] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: > 0000000000000015 > [ 7962.437723] RBP: 0000563c35a94990 R08: 0000563c33b95a30 R09: > 0000000000000004 > [ 7962.437724] R10: 0000000000000000 R11: 0000000000000246 R12: > 0000000000000000 > [ 7962.437724] R13: 0000563c34196d00 R14: 0000000000000000 R15: > 00007f4bedffb640 > [ 7962.437726] ---[ end trace 7f0f339c3a001d7b ]--- > -- Vitaly