Hi Paul (& all), I strongly believe that this is a bug in QEMU. I was looking for bugs and found something that looks related to what we are seeing. Precisely at Ubuntu's bug #*1887490* <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490
In the link above, there was the following comment: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/53 It seems one of the patches also introduced a regression:* lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various SVM-related flags. Specifically npt and nrip-save are now expected to be present by default as shown in the updated testdata.This however breaks migration from instances using *EPYC* or *EPYC-IBPB* CPU models started with libvirt versions prior to this one because the instance on the target host has these extra flags More about #*1887490* <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can be found at the mail https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html. We can see that the specific bug was addressed in "linux (5.4.0-49.53) focal". linux (5.4.0-49.53) focal; urgency=medium * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490) - kvm: svm: Update svm_xsaves_supported Regards, Gabriel. On Fri, Dec 3, 2021 at 10:59 AM Paul Angus <paul.an...@ticketmaster.com> wrote: > Which version(s) of QEMU are you using Wido? > > We've just be upgrading CentOS 7.6 to 7.9 > Most 7.6 hosts had qemu-ev 2.10 on it (the buggy one). 2.12 was on the > new hosts. > We were getting errors complaining that the ibpb CPU feature wasn't > available when migrating to the updated OS hosts (even though identical > hardware). > > Upgrading qemu-ev to 2.12 on the originating host, then stopping and > starting the VMs, then allowed us to migrate. We couldn't find any > solution that didn't involve stopping and starting the VMs. > > Paul. > > -----Original Message----- > From: Wido den Hollander <w...@widodh.nl> > Sent: Monday, November 29, 2021 7:57 AM > To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com> > Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and 20.04 > > > > On 11/24/21 10:36 PM, Wei ZHOU wrote: > > Hi Wido, > > > > I think it is not good to run an environment with two ubuntu/qemu > versions. > > It always happens that some cpu features are supported in the higher > > version but not supported in the older version. > > From my experience, the migration from older version to higher version > > works like a charm, but there were many issues in migration from > > higher version to older version. > > > > I understand. But with a large amount of hosts and working your way > through upgrades you sometimes run into these situations. Therefor it would > be welcome if it works. > > > I do not have a solution for you. I have tried to hack > > /etc/libvirt/hooks/qemu but it didn't work. > > Have you tried with other cpu models like x86_Opteron_G5 ? you can > > find the cpu features of each cpu model in /usr/share/libvirt/cpu_map/ > > > > I have not tried that yet, but I can see if that works. > > The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but even using > that model we can't seem to migrate as it complains about the 'npt' feature. > > Wido > > > Anyway, even if the vm migration succeeds, you do not know if vm works > > fine. I believe the best solution is upgrading all hosts to the same > > OS version. > > > > -Wei > > > > On Tue, 23 Nov 2021 at 16:31, Wido den Hollander <w...@widodh.nl> wrote: > > > >> Hi, > >> > >> I'm trying to debug an issue with live migrations between Ubuntu > >> 18.04 and 20.04 machines each with different CPUs: > >> > >> - Ubuntu 18.04 with AMD Epyc 7552 (Rome) > >> - Ubuntu 20.04 with AMD Epyc 7662 (Milan) > >> > >> We are currently using this setting: > >> > >> guest.cpu.mode=custom > >> guest.cpu.model=EPYC > >> > >> This does not allow for live migrations: > >> > >> Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552 fails > >> > >> "ExecutionException : org.libvirt.LibvirtException: unsupported > >> configuration: unknown CPU feature: npt" > >> > >> So we tried to define a set of features manually: > >> > >> guest.cpu.features=3dnowprefetch abm adx aes apic arat avx avx2 bmi1 > >> bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma fpu > >> fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx mmxext > >> monitor movbe msr mtrr nx osvw pae pat pclmuldq pdpe1gb pge pni > >> popcnt pse pse36 rdrand rdseed rdtscp sep sha-ni smap smep sse sse2 > >> sse4.1 sse4.2 sse4a > >> ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt -x2apic > >> -hypervisor -topoext -nrip-save > >> > >> This results in this going into the XML: > >> > >> <feature policy='disable' name='npt'/> > >> > >> You would say that works, but then the target host (18.04 with the > >> 7552) says it doesn't support the feature 'npt' and the migration still > fails. > >> > >> Now we could ofcourse use the kvm64 CPU from Qemu, but that's lacking > >> so many features that for example TLS offloading isn't available. > >> > >> I also tried to set 'EPYC-Rome' on the Ubuntu 20.04 hypervisor, but > >> it then complains on the Ubuntu 18.04 hypervisor that the CPU > 'EPYC-Rome' > >> is unknown as the 18.04 hypervisor doesn't have that profile. > >> > >> Any ideas on how to get this working? > >> > >> Wido > >> > > > This message is confidential and may be legally privileged or otherwise > protected from disclosure. If you are not the intended recipient, please > telephone or email the sender and delete this message and any attachment > from your system; you must not copy or disclose the contents of this > message or any attachment to any other person. We may monitor email traffic > and the content of internal and external messages sent to and from us to > ensure compliance with internal policies and for the purposes of security. > > Ticketmaster UK Limited. Registered Office: 30 St John Street, London EC1M > 4AY. Registered in England and Wales. Company Number 02662632. >