>> >> hi all, >> >> >> >> I met similar problem to these, while performing live migration or >> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, >> >> guest:suse11sp2), running tele-communication software suite in >> >> guest, >> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html >> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 >> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 >> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771 >> >> >> >> After live migration or virsh restore [savefile], one process's CPU >> >> utilization went up by about 30%, resulted in throughput >> >> degradation of this process. >> >> >> >> If EPT disabled, this problem gone. >> >> >> >> I suspect that kvm hypervisor has business with this problem. >> >> Based on above suspect, I want to find the two adjacent versions of >> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), >> >> and analyze the differences between this two versions, or apply the >> >> patches between this two versions by bisection method, finally find the >> >> key patches. >> >> >> >> Any better ideas? >> >> >> >> Thanks, >> >> Zhang Haoyu >> > >> >I've attempted to duplicate this on a number of machines that are as >> >similar to yours as I am able to get my hands on, and so far have not been >> >able to see any performance degradation. And from what I've read in the >> >above links, huge pages do not seem to be part of the problem. >> > >> >So, if you are in a position to bisect the kernel changes, that would >> >probably be the best avenue to pursue in my opinion. >> > >> >Bruce >> >> I found the first bad >> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w >> information to gup(), allow read-only memory) which triggers this problem by >> git bisecting the kvm kernel (download from >> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. >> >> And, >> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p > >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log >> git diff >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 >> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff >> >> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, >> came to a conclusion that all of the differences between >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 >> are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, >> so this commit is the peace-breaker which directly or indirectly causes the >> degradation. >> >> Does the map_writable flag passed to mmu_set_spte() function have effect on >> PTE's PAT flag or increase the VMEXITs induced by that guest tried to write >> read-only memory? >> >> Thanks, >> Zhang Haoyu >> > >There should be no read-only memory maps backing guest RAM. > >Can you confirm map_writable = false is being passed to __direct_map? (this >should not happen, for guest RAM). >And if it is false, please capture the associated GFN. > I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.000000000 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.000000000 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn;
+ if (!map_writable) + printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. >Its probably an issue with an older get_user_pages variant (either in kvm-kmod >or the older kernel). Is there any indication of a similar issue with upstream >kernel? I will test the upstream kvm host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is still there, I will revert the first bad commit patch: 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again. And, I collected the VMEXITs statistics in pre-save and post-restore period at first bad commit version, pre-save: COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30 Performance counter stats for 'sleep 30': 1222318 kvm:kvm_entry 0 kvm:kvm_hypercall 0 kvm:kvm_hv_hypercall 351755 kvm:kvm_pio 6703 kvm:kvm_cpuid 692502 kvm:kvm_apic 1234173 kvm:kvm_exit 223956 kvm:kvm_inj_virq 0 kvm:kvm_inj_exception 16028 kvm:kvm_page_fault 59872 kvm:kvm_msr 0 kvm:kvm_cr 169596 kvm:kvm_pic_set_irq 81455 kvm:kvm_apic_ipi 245103 kvm:kvm_apic_accept_irq 0 kvm:kvm_nested_vmrun 0 kvm:kvm_nested_intercepts 0 kvm:kvm_nested_vmexit 0 kvm:kvm_nested_vmexit_inject 0 kvm:kvm_nested_intr_vmexit 0 kvm:kvm_invlpga 0 kvm:kvm_skinit 853020 kvm:kvm_emulate_insn 171140 kvm:kvm_set_irq 171534 kvm:kvm_ioapic_set_irq 0 kvm:kvm_msi_set_irq 99276 kvm:kvm_ack_irq 971166 kvm:kvm_mmio 33722 kvm:kvm_fpu 0 kvm:kvm_age_page 0 kvm:kvm_try_async_get_page 0 kvm:kvm_async_pf_not_present 0 kvm:kvm_async_pf_ready 0 kvm:kvm_async_pf_completed 0 kvm:kvm_async_pf_doublefault 30.019069018 seconds time elapsed post-restore: COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30 Performance counter stats for 'sleep 30': 1327880 kvm:kvm_entry 0 kvm:kvm_hypercall 0 kvm:kvm_hv_hypercall 375189 kvm:kvm_pio 6925 kvm:kvm_cpuid 804414 kvm:kvm_apic 1339352 kvm:kvm_exit 245922 kvm:kvm_inj_virq 0 kvm:kvm_inj_exception 15856 kvm:kvm_page_fault 39500 kvm:kvm_msr 1 kvm:kvm_cr 179150 kvm:kvm_pic_set_irq 98436 kvm:kvm_apic_ipi 247430 kvm:kvm_apic_accept_irq 0 kvm:kvm_nested_vmrun 0 kvm:kvm_nested_intercepts 0 kvm:kvm_nested_vmexit 0 kvm:kvm_nested_vmexit_inject 0 kvm:kvm_nested_intr_vmexit 0 kvm:kvm_invlpga 0 kvm:kvm_skinit 955410 kvm:kvm_emulate_insn 182240 kvm:kvm_set_irq 182562 kvm:kvm_ioapic_set_irq 0 kvm:kvm_msi_set_irq 105267 kvm:kvm_ack_irq 1113999 kvm:kvm_mmio 37789 kvm:kvm_fpu 0 kvm:kvm_age_page 0 kvm:kvm_try_async_get_page 0 kvm:kvm_async_pf_not_present 0 kvm:kvm_async_pf_ready 0 kvm:kvm_async_pf_completed 0 kvm:kvm_async_pf_doublefault 30.000779718 seconds time elapsed Thanks, Zhang Haoyu