On Sat, Jul 27, 2013 at 07:47:49AM +0000, Zhanghaoyu (A) wrote: > >> hi all, > >> > >> I met similar problem to these, while performing live migration or > >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, > >> guest:suse11sp2), running tele-communication software suite in guest, > >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html > >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 > >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 > >> https://bugzilla.kernel.org/show_bug.cgi?id=58771 > >> > >> After live migration or virsh restore [savefile], one process's CPU > >> utilization went up by about 30%, resulted in throughput degradation > >> of this process. > >> > >> If EPT disabled, this problem gone. > >> > >> I suspect that kvm hypervisor has business with this problem. > >> Based on above suspect, I want to find the two adjacent versions of > >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), > >> and analyze the differences between this two versions, or apply the > >> patches between this two versions by bisection method, finally find the > >> key patches. > >> > >> Any better ideas? > >> > >> Thanks, > >> Zhang Haoyu > > > >I've attempted to duplicate this on a number of machines that are as similar > >to yours as I am able to get my hands on, and so far have not been able to > >see any performance degradation. And from what I've read in the above links, > >huge pages do not seem to be part of the problem. > > > >So, if you are in a position to bisect the kernel changes, that would > >probably be the best avenue to pursue in my opinion. > > > >Bruce > > I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: > propagate fault r/w information to gup(), allow read-only memory) which > triggers this problem > by git bisecting the kvm kernel (download from > https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. > > And, > git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p > > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log > git diff > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 > > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff > > Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, > came to a conclusion that all of the differences between > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 > are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so > this commit is the peace-breaker which directly or indirectly causes the > degradation. > > Does the map_writable flag passed to mmu_set_spte() function have effect on > PTE's PAT flag or increase the VMEXITs induced by that guest tried to write > read-only memory? > > Thanks, > Zhang Haoyu >
There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. Its probably an issue with an older get_user_pages variant (either in kvm-kmod or the older kernel). Is there any indication of a similar issue with upstream kernel?