On 6/9/25 18:39, Sean Christopherson wrote:


On Mon, Jun 09, 2025, Denis V. Lunev wrote:
On 6/9/25 18:12, Paolo Bonzini wrote:
On 6/9/25 15:23, Andrey Zhadchenko wrote:
When hotplugging vCPUs to the Windows vms, we observed strange instance
crash on Intel(R) Xeon(R) CPU E3-1230 v6:
panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff',
arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'

Presumably, Windows thinks that hotplugged CPU is not "equivalent
enough"
to the previous ones. The problem lies within msr 3a. During the
startup,
Windows assigns some value to this register. During the hotplug it
expects similar value on the new vCPU in msr 3a. But by default it
is zero.

If I understand correctly, you checked that it's Windows that writes
0x40005 to the MSR on non-hotplugged CPUs.

...

Actually no, it may also be firmware.
We are only sure that it is Windows code that crashes the vm.


Bit #18 probably means that Intel SGX is supported, because disabling
it via CPU arguments results is successfull hotplug (and msr value 0x5).

What is the trace like in this case?  Does Windows "accept" 0x0 and
write 0x5?

It 'accepts' 0x0, but does not write anything there.


Does anything in edk2 run during the hotplug process (on real hardware
it does, because the whole hotplug is managed via SMM)? If so maybe that
could be a better place to write the value.

Yeah, I would expect firmware to write and lock IA32_FEATURE_CONTROL.

So many questions, but I'd really prefer to avoid this hack if the only
reason for it is SGX...

Does your setup actually support SGX?  I.e. expose EPC sections to the guest?
If not, can't you simply disable SGX in CPUID?

We do not have any TYPE_MEMORY_BACKEND_EPC objects in our default config, but have the following: sgx=on,sgx1=on,sgx-debug=on,sgx-mode64=on,sgx-provisionkey=on,sgx-tokenkey=on We found this during testing, and it can be disabled on our testing setup without any worries indeed. I have no data whether someone actually sets it properly in the wild, which may still be possible.


Linux by itself handles this well and assigns MSRs properly (we observe
corresponding set_msr on the hotplugged CPU).

I think Linux, at least old 4.4, does not write msr on hotplug. Anyway it hotplugs fine and tolerates different value unlike Windows


Linux is much more tolerant of oddities, and quite a bit of effort went into
making sure that IA32_FEATURE_CONTROL was initialized if firmware left it 
unlocked.

Thanks everyone for the ideas. I focused on Windows too much and did not investigate into firmware, so perhaps this is rather a firmware problem? I think by default we are using seaBIOS, not ovmf/edk2. I will update after some testing with different configurations.

Reply via email to