So finally tested with this:
-cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,-vmx
The used hyper-v enhancements are the ones generally recommended for
Windows vms.
Overall it seemed to really work: the performance was like bare metal,
and the BSOD second problem was also gone (for this I had to test by
installing another Win11 23h2 vm from scratch and run windows updates).
Also, unlike before, the Windows "suspend" functions also appeared; and
most surprising: the actually worked.
Tried suspending, worked. Even tried enabling the infamous "fast boot"
and shutting down vm. Result, it took a little more to shut down, but
when powering on vm again, it did restore.
Though did each test only once...
Did these last tests because in many QEMU/KVM guides around internet I
had read that, at least with Windows vms, it was very important to
disable fast boot because QEMU/KVM did not support it and lead to ugly
buggy functionalities.
So, did this change over time?
This could apparently imply that the culprit was the "vmx" CPU bit,
which as already explained it's the one enabling nested virtualization
inside the vm.
Overall, what would you think? Could this qualify as kind of a bug? Is
nested virtualization often used in QEMU/KVM vms?
Could it be that Win11 23h2 has problems with this CPU bit?
Oh, and based on the results, I have few additional doubts:
If wanted to do live migration, would it be a matter of just switching
"host" for "Skylake" or any other "fixed" QEMU CPU model, then just
checking vm still boots correctly?
When trying "-cpu host,hv-passthrough", I did notice a considerable
improvement in overall performance than when using
"hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time"; yet still noticeably
not like bare metal. Why was this?
In another forum, I read a guy did not have problems updating from Win11
22h2 to 23h2 on QEMU/KVM; though he used libvirt.
Among his CPU settings, he did not use the CPU passthrough, but a QEMU
CPU model, which cannot remember which was except that it was a Xeon
server model. Moreover, among the used CPU bits, there was vmx=on.
If the culprit here was apparently this vmx bit, how it is that for
others it bore no consequence? The only difference was using a "server"
CPU model instead of a "client" one. Though they did not talk about
performance...
Thanks.
El 2024-01-16 11:56, Paolo Bonzini escribió:
One possibility is that you have Hyper-V enabled with -cpu host but
not with other CPU models. That's because "-cpu host" enables nested
virtualization.
Try "-cpu host,-vmx" and it should be clear if that's the case.
Based on the pastie that you prepared, that's the main difference
between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see
list below) should have any substantial performance impact; even less
so should they make things worse.
Paolo
"avx512-vp2intersect": true,
"avx512-vpopcntdq": true,
"avx512bitalg": true,
"avx512bw": true,
"avx512cd": true,
"avx512dq": true,
"avx512f": true,
"avx512ifma": true,
"avx512vbmi": true,
"avx512vbmi2": true,
"avx512vl": true,
"avx512vnni": true,
"full-width-write": true,
"gfni": true,
"vaes": true,
"vpclmulqdq": true,
"clflushopt": true,
"clwb": true,
"fsrm": true,
"host-cache-info": false,
"host-phys-bits": true,
"amd-ssbd": true,
"amd-stibp": true,
"arch-capabilities": true,
"ibpb": true,
"ibrs": true,
"ibrs-all": true,
"ssbd": true,
"stibp": true,
"kvm-pv-ipi": true,
"kvm-pv-sched-yield": true,
"kvm-pv-tlb-flush": true,
"kvm-pv-unhalt": true,
"lmce": true,
"md-clear": true,
"mds-no": true,
"movdir64b": true,
"movdiri": true,
"pdcm": true,
"pdpe1gb": true,
"pdcm": false,
"pdpe1gb": false,
"pku": true,
"pmu": true,
"pschange-mc-no": true,
"rdctl-no": true,
"rdpid": true,
"sha-ni": true,
"ss": true,
"tsc-adjust": true,
"umip": true,
"vmx": true,
"xgetbv1": true,
"xsavec": true,
"xsaves": true,
(skipped everything vmx-related, since they don't matter with vmx
itself being false)