On 06/10/2017 23:52, Eduardo Habkost wrote: > Commit f010bc643a (target-i386: add feature kvm_pv_unhalt) introduced the > kvm_pv_unhalt feature but didn't enable it by default. > > Without kvm_pv_unhalt we see a measurable degradation in scheduling > performance, so enabling it by default does make sense IMHO. This patch > just flips it to default to on by default. > > [With kvm_pv_unhalt disabled] > $ perf bench sched messaging -l 10000 > Total time: 8.573 [sec] > > [With kvm_pv_unhalt enabled] > $ perf bench sched messaging -l 10000 > Total time: 4.416 [sec]
I cannot reproduce this: Host CPU model: Haswell-EP (Xeon E5-2697 v3 @ 2.60 GHz) Host physical CPUs: 56 (2 sockets 14 cores/sockets, 2 thread/core) Host Linux kernel: 4.14 (more or less :)) Host memory: 64 GB Guest Linux kernel: 4.10.13 QEMU command line: /usr/libexec/qemu-kvm -cpu host,+kvm_pv_unhalt -M q35 \ -m XXX -smp YYY \ /path/to/vm.qcow2 (XXX = MiB of guest memory, YYY = number of guest processors) "perf bench sched messaging -l 50000" has the following result for me: Guest vCPUs Guest memory without PV unhalt with PV unhalt 1*96 32 GB 24.6 s 24.2 s 2*96 24 GB 47.9 s (both VMs) 46.8 s (both VMs) 2*48 16 GB 50.4 s (both VMs) 49.3 s (both VMs) 4*24 12 GB 82.1 - 89.0 s 82.3 - 88.8 s 4*4 12 GB 87.2 - 91.3 s 90.3 - 94.9 s All but the first line are running the benchmark in multiple guests, concurrently. The improvement seems to be about 2-3% for guests larger than 1 NUMA node, and zero or negative for smaller guests (especially as the host is no longer overcommitted). The difference for large NUMA guests is small but I ran the benchmark multiple times and it is statistically significant---but not as large as what Alex reported. Paolo