Hello, So, even if SRSO is fully mitigated on the host, we still see as not completely patched inside of the VMs running on Zen 3 and 4 hosts (e.g., AMD EPYC 7713)
We can see an example of that here: https://bugzilla.suse.com/show_bug.cgi?id=1228079 This specific bug is about SLE15SP5, where we have old versions of QEMU and kernel (and the kernel was missing a backport), but we see it also on openSUSE Tumbleweed, where we have kernel 6.9.9 and QEMU 9.0.2, e.g: ### Host virt136:~ # uname -a Linux virt136 6.9.9-1-default #1 SMP PREEMPT_DYNAMIC Thu Jul 11 11:31:54 UTC 2024 (8c0f797) x86_64 x86_64 x86_64 GNU/Linux virt136:~ # qemu-system-x86_64 --version QEMU emulator version 9.0.2 (openSUSE Tumbleweed) virt136:~ # cat /proc/cpuinfo | grep -e "vendor\|family\|model\|stepping\|microcode" | tail -5 cpu family : 25 model : 1 model name : AMD EPYC 7713 64-Core Processor stepping : 1 microcode : 0xa0011d5 virt136:~ # lscpu | grep rstack Vulnerability Spec rstack overflow: Mitigation; Safe RET ### Guest virt136:~ # virsh console opensusetumbleweed Connected to domain 'opensusetumbleweed' Escape character is ^] (Ctrl + ]) localhost:~ # uname -a Linux localhost.localdomain 6.9.9-1-default #1 SMP PREEMPT_DYNAMIC Thu Jul 11 11:31:54 UTC 2024 (8c0f797) x86_64 x86_64 x86_64 GNU/Linux localhost:~ # lscpu | grep rstack Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode Fabian and Nikolay (Cc-ed) can provide more details, if necessary. AFAIUI this is due to how/when the proper bit in CPUID is set, on this specific CPU model. In fact (and I'm quoting Nikolay): "the problem with the IBPB_BRTYPE flag is that on CPUs which require the microcode fix, the flag is not shown by CPUID despite it actually being available. Ini this case the kernel checks whether the feature is valid by doing a specific wrmsr and, if it is, it sets the flag internally in the kernel and in KVM's cpuid representation". Yet, we don't see this in the VM, because QEMU seems to be masking it (as visible, e.g., here: https://bugzilla.suse.com/show_bug.cgi?id=1228079#c2) In one of the thread on LKML where this mitigation was discussed, we found mentions (from Josh, Cc-ed) to a QEMU patch being necessary, and also being ready and about to be submitted, e.g.: https://lore.kernel.org/lkml/20230821170520.dcovzudamnoqp7jc@treble/ But I'm not able to find such patch, neither in the mailing list not as a commit... Is it just me not seeing it? In case it's really missing, what's the best course of action? Do we need that patch? Is it (in the form in which it appears in that email) still correct? Thanks and Regards, -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
signature.asc
Description: This is a digitally signed message part