Oh cool, thanks a lot for the explanation. I added the "vzeroupper" and Xen crashes so it looks like the CPUID emulation is buggy. Also I was able to try it using a VM (same debian testing) running on virt-manager+kvm and it works fine (Xen in debug mode). I will have a look by printing the xstate when running on virt-manager+KVM and I will also run the xen-cpuid command to see the difference just by curiosity as with your test we already spotted the issue. Thanks again for your enlightenment. I will continue my testing later today and if you need me to test something else you are welcome, just ask I will do my best.
Guillaume On Sun, Feb 2, 2025 at 6:32 PM Andrew Cooper <andrew.coop...@citrix.com> wrote: > On 02/02/2025 4:58 pm, Guillaume wrote: > > I attached the output of the `xl dmesg`. This is the 4.19.1 kernel I > > rebuild but I have the same issue with master (just for info). > > Thanks. This is a TigerLake CPU, and: > > > (XEN) Mitigating GDS by disabling AVX while virtualised - protections > > are best-effort > > is why Xen is ignoring AVX. > > Now, as to the bug. From the panic line, you're seeing: > > > XSTATE 0x0000000000000003, uncompressed hw size 0x340 != xen size 0x240 > > xstate is XCR0_SSE | XCR0_X87, and the correct size for this > configuration is 0x240. > > There reason why it matters is because this is the amount of data the > processor will write out/read in for the XSAVE/XRSTOR instructions, > which are used for context switching. These instructions are also > available in userspace. > > Here, VirtualBox is claiming that with AVX disabled, it will still write > out the AVX registers. This is buggy, but we're going to have to narrow > it down further. > > Can you try building Xen with this additional line > > diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c > index af9e345a7ace..5a5011ba8b10 100644 > --- a/xen/arch/x86/xstate.c > +++ b/xen/arch/x86/xstate.c > @@ -789,6 +789,8 @@ static void __init noinline xstate_check_sizes(void) > */ > check_new_xstate(&s, X86_XCR0_SSE | X86_XCR0_X87); > > + asm volatile ("vzeroupper"); > + > if ( cpu_has_avx ) > check_new_xstate(&s, X86_XCR0_YMM); > > > and see if the result crashes or boots? > > One possible bug is that VirtualBox is shadowing XCR0 and the real > setting in hardware is 0x7 (including XCR0_AVX) rather than 0x3. In > this case, the reported size is correct, and VirtualBox is failing to > honour the XSETBV setting. > > Alternatively, another bug is that XCR0 is really 0x3, but the CPUID > emulation for max size is wrong, in which case the XSAVE/etc > instructions wont actually access beyond 0x240, and "all" that's wrong > is that we'll allocate a larger buffer than necessary. > > The VZEROUPPER (an AVX instruction) should distinguish these two cases. > If Xen crashes with it in place, then the XCR0 register is correct and > it's CPUID which is buggy. If Xen boots with that in place, then > Virtualbox is shadowing XCR0 with a different value behind Xen's back. > > ~Andrew >