Sorry, just piecing this together and looking at things that have probably already been looked at!
Looking at the Libvirt CPU xml files, it's interesting that both x86_EPYC-Milan.xml <https://github.com/libvirt/libvirt/blob/master/src/cpu_map/x86_EPYC-Milan.xml> and x86_EPYC-Rome.xml <https://github.com/libvirt/libvirt/blob/master/src/cpu_map/x86_EPYC-Rome.xml> have 'npt', I guess the Ubuntu kernel on 18.04 doesn't support npt, you'd see the difference under the host XML in 'virsh capabilities' command. This would be similar to the 'vmx' flag for nested virtualization. You won't find the 'vmx' capability in any of the CPU XML, however if you enable it via kvm module parameter the VM gets it, and then you can't migrate to non-vmx hosts even with the same CPU. If something like this were happening though I'd still expect to see 'npt' in the source VM XML and on its qemu command unless it's similar but not quite the same issue. On Mon, Dec 13, 2021 at 10:32 AM Marcus <shadow...@gmail.com> wrote: > That does sound like some sort of libvirt, then. I don't know why it would > fail to transfer with " unknown CPU feature" when the source VM XML is > not calling for it or a model that would include it. > > On Sat, Dec 11, 2021 at 3:32 AM Wido den Hollander <w...@widodh.nl> wrote: > >> >> >> Op 11-12-2021 om 00:52 schreef Marcus: >> > Just for clarity - Wido you mention that you tried using a common CPU >> model >> > across the platforms (which presumably doesn't contain npt) but >> migration >> > still fails on npt missing. That does seem like a bug of some sort, I >> would >> > expect that the the following should work: >> > >> >> Indeed, that failed. >> >> > * Update cloudstack agent configs to use 'EPYC-IBPB' common identical >> > model, restart agent >> > * Stop VM on source host (ubuntu 20.04) >> > * Start VM on source host (ubuntu 20.04) - at this point you should not >> > have a feature 'npt' in the XML of the running VM. If you do then >> there's >> > something wrong with the EPYC-IBPB or libvirt's interpretation >> > * Attempt to migrate to destination host (ubuntu 18.04) >> > >> > Is this process failing? Just want to ensure the source VM was restarted >> > and does not contain npt in the XML (and also on the resulting qemu >> command >> > line), but still the migration complains about missing that feature. >> > >> >> I tried with EPYC-IBPB as well and restarted the VM prior to the >> migration. >> >> 20.04 -> 18.04 fails even though the IBPB model in libvirt is exactly >> the same between 18 and 20. >> >> It complains about the npt feature lacking and thus the migration fails. >> >> > I'm also making an assumption here that /proc/cpuinfo on an Epyc 7552 >> does >> > not have npt, but an Epyc 7662 does. Is that correct? >> > >> >> Correct. >> >> > On Tue, Dec 7, 2021 at 6:46 AM Gabriel Bräscher <gabrasc...@gmail.com> >> > wrote: >> > >> >> Paul, I confused the issues then. >> >> >> >> The one I mentioned fits only with what Wido reported in this thread. >> >> The CPU flag matches with the ones raised on that bug. Flags like >> *npt* & >> >> *nrip-save* which are present when SVM is enabled. >> >> Therefore, affected by kernel commit -- 52297436199d ("kvm: svm: Update >> >> svm_xsaves_supported"). >> >> Additionally, the OS/Qemu versions also do fit with what is reported on >> >> Ubuntu' qemu package "bug #1887490". >> >> >> >> Regards >> >> >> >> On Tue, Dec 7, 2021 at 12:10 PM Paul Angus <p...@angus.uk.com.invalid> >> >> wrote: >> >> >> >>> The qemu-ev 2.10 bug was first reported a year or two ago in the >> mailing >> >>> lists. >> >>> >> >>> -----Original Message----- >> >>> From: Gabriel Bräscher <gabrasc...@gmail.com> >> >>> Sent: Tuesday, December 7, 2021 9:41 AM >> >>> To: dev <dev@cloudstack.apache.org> >> >>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and >> 20.04 >> >>> >> >>> Just adding to the "qemu-ev 2.10" & "qemu-ev 2.12" point. >> >>> >> >>>> migration fails from qemu-ev 2.10 to qemu-ev 2.12, this is definitely >> >>>> a bug in my point of view. >> >>>> >> >>> >> >>> On the comment 53 (at "bug #1887490"): >> >>> >> >>>> It seems *one of the patches also introduced a regression*: >> >>>> * lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patch >> >>>> adds various SVM-related flags. Specifically *npt and nrip-save are >> >>>> now expected to be present by default* as shown in the updated >> >> testdata. >> >>>> This however breaks migration from instances using EPYC or EPYC-IBPB >> >>>> CPU models started with libvirt versions prior to this one because >> the >> >>>> instance on the target host has these extra flags >> >>> >> >>> >> >>> From the tests reported there, it fails in both ways. >> >>> 1. From *older* qemu package to *newer*: >> >>> *source* host does not map the CPU flag; however, *target* host >> >>> expects the flag to be there, by default. >> >>> 2. From *newer* qemu package to *older*: >> >>> the instance "domain.xml" in the *source* host has a CPU flag >> that is >> >>> not mapped by qemu in the *target* host. >> >>> >> >>> >> >>> >> >>> On Tue, Dec 7, 2021 at 10:22 AM Sven Vogel <s.vo...@ewerk.com> wrote: >> >>> >> >>>> Let me check. We had the same problem on RHEL/CentOS but I am not >> sure >> >>>> if this a bug. What I know there was a change in the XML. Let me ask >> >>>> one on my colleges in my team. >> >>>> >> >>>> 😉 >> >>>> >> >>>> >> >>>> __ >> >>>> >> >>>> Sven Vogel >> >>>> Senior Manager Research and Development - Cloud and Infrastructure >> >>>> >> >>>> EWERK DIGITAL GmbH >> >>>> Brühl 24, D-04109 Leipzig >> >>>> P +49 341 42649 - 99 >> >>>> F +49 341 42649 - 98 >> >>>> s.vo...@ewerk.com >> >>>> www.ewerk.com >> >>>> >> >>>> Geschäftsführer: >> >>>> Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke >> >>>> Registergericht: Leipzig HRB 9065 >> >>>> >> >>>> Support: >> >>>> +49 341 42649 555 >> >>>> >> >>>> Zertifiziert nach: >> >>>> ISO/IEC 27001:2013 >> >>>> DIN EN ISO 9001:2015 >> >>>> DIN ISO/IEC 20000-1:2018 >> >>>> >> >>>> ISAE 3402 Typ II Assessed >> >>>> >> >>>> EWERK-Blog<https://blog.ewerk.com/> | LinkedIn< >> >>>> https://www.linkedin.com/company/ewerk-group> | Xing< >> >>>> https://www.xing.com/company/ewerk> | Twitter< >> >>>> https://twitter.com/EWERK_Group> | Facebook< >> >>>> https://de-de.facebook.com/EWERK.Group/> >> >>>> >> >>>> >> >>>> Auskünfte und Angebote per Mail sind freibleibend und unverbindlich. >> >>>> >> >>>> Disclaimer Privacy: >> >>>> Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter >> Dateien) >> >>>> ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht >> >>>> der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, >> >>>> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. >> Bitte >> >>>> informieren Sie in diesem Fall unverzüglich den Absender und löschen >> >>>> Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von >> Ihrem >> >>> System. >> >>>> Vielen Dank. >> >>>> >> >>>> The contents of this e-mail (including any attachments) are >> >>>> confidential and may be legally privileged. If you are not the >> >>>> intended recipient of this e-mail, any disclosure, copying, >> >>>> distribution or use of its contents is strictly prohibited, and you >> >>>> should please notify the sender immediately and then delete it >> >>> (including any attachments) from your system. Thank you. >> >>>> Von: Gabriel Bräscher <gabrasc...@gmail.com> >> >>>> Datum: Dienstag, 7. Dezember 2021 um 09:57 >> >>>> An: dev <dev@cloudstack.apache.org> >> >>>> Betreff: Re: Live migration between AMD Epyc and Ubuntu 18.04 and >> >>>> 20.04 Wei, I agree. >> >>>> This is not necessarily a bug per se. >> >>>> >> >>>> The main point here is: the issue we are seeing is the "bug #1887490" >> >>>> raised in Ubuntu's qemu package. >> >>>> CPU features were added on the newer releases, which caused the >> >>>> compatibility issue when (live) migrating VMs between compatible >> >>>> hardware but different qemu packages. >> >>>> >> >>>> >> >>>> On Tue, Dec 7, 2021 at 9:26 AM Wei ZHOU <ustcweiz...@gmail.com> >> wrote: >> >>>> >> >>>>> Hi Gabriel, >> >>>>> >> >>>>> In my opinion, migration should work from lower version to higher >> >>>> version, >> >>>>> but no guarantee from higher version to lower version, like we >> >>>>> upgrade cloudstack. >> >>>>> Therefore, migrate should work from ubuntu 18.04 to ubuntu 20.04. >> >>>>> But it >> >>>> is >> >>>>> not a bug if migration fails from ubuntu 20.04 to ubuntu 18.04. >> >>>>> >> >>>>> As Paul said, migration fails from qemu-ev 2.10 to qemu-ev 2.12, >> >>>>> this is definitely a bug in my point of view. >> >>>>> >> >>>>> -Wei >> >>>>> >> >>>>> On Mon, 6 Dec 2021 at 16:05, Gabriel Bräscher <gabrasc...@gmail.com >> > >> >>>>> wrote: >> >>>>> >> >>>>>> Hi Paul (& all), >> >>>>>> >> >>>>>> I strongly believe that this is a bug in QEMU. >> >>>>>> I was looking for bugs and found something that looks related to >> >>>>>> what >> >>>> we >> >>>>>> are seeing. Precisely at Ubuntu's bug #*1887490* >> >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>: >> >>>>>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490 >> >>>>>> >> >>>>>> In the link above, there was the following comment: >> >>>>>> >> >>>> >> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/5 >> >>>> 3 >> >>>>>> >> >>>>>> It seems one of the patches also introduced a regression:* >> >>>>>> lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various >> >>>>>> SVM-related flags. Specifically npt and nrip-save are now expected >> >>>>>> to >> >>>> be >> >>>>>> present by default as shown in the updated testdata.This however >> >>>>>> breaks migration from instances using *EPYC* or *EPYC-IBPB* CPU >> >>>>>> models started with libvirt versions prior to this one because the >> >>>>>> instance on the >> >>>>> target >> >>>>>> host has these extra flags >> >>>>>> >> >>>>>> >> >>>>>> More about #*1887490* >> >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can >> >>>>>> be >> >>>>> found >> >>>>>> at the mail >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html >> . >> >>>>>> We can see that the specific bug was addressed in "linux >> >>>>>> (5.4.0-49.53) focal". >> >>>>>> >> >>>>>> linux (5.4.0-49.53) focal; urgency=medium >> >>>>>> >> >>>>>> * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490) >> >>>>>> - kvm: svm: Update svm_xsaves_supported >> >>>>>> >> >>>>>> >> >>>>>> Regards, >> >>>>>> Gabriel. >> >>>>>> >> >>>>>> On Fri, Dec 3, 2021 at 10:59 AM Paul Angus < >> >>>> paul.an...@ticketmaster.com> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Which version(s) of QEMU are you using Wido? >> >>>>>>> >> >>>>>>> We've just be upgrading CentOS 7.6 to 7.9 Most 7.6 hosts had >> >>>>>>> qemu-ev 2.10 on it (the buggy one). 2.12 was on >> >>>> the >> >>>>>>> new hosts. >> >>>>>>> We were getting errors complaining that the ibpb CPU feature >> >>>>>>> wasn't available when migrating to the updated OS hosts (even >> >>>>>>> though >> >>>> identical >> >>>>>>> hardware). >> >>>>>>> >> >>>>>>> Upgrading qemu-ev to 2.12 on the originating host, then stopping >> >>>>>>> and starting the VMs, then allowed us to migrate. We couldn't >> >>>>>>> find any solution that didn't involve stopping and starting the >> >>> VMs. >> >>>>>>> >> >>>>>>> Paul. >> >>>>>>> >> >>>>>>> -----Original Message----- >> >>>>>>> From: Wido den Hollander <w...@widodh.nl> >> >>>>>>> Sent: Monday, November 29, 2021 7:57 AM >> >>>>>>> To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com> >> >>>>>>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 >> >>>>>>> and >> >>>> 20.04 >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> On 11/24/21 10:36 PM, Wei ZHOU wrote: >> >>>>>>>> Hi Wido, >> >>>>>>>> >> >>>>>>>> I think it is not good to run an environment with two >> >>>>>>>> ubuntu/qemu >> >>>>>>> versions. >> >>>>>>>> It always happens that some cpu features are supported in the >> >>>> higher >> >>>>>>>> version but not supported in the older version. >> >>>>>>>> From my experience, the migration from older version to higher >> >>>>> version >> >>>>>>>> works like a charm, but there were many issues in migration >> >>>>>>>> from higher version to older version. >> >>>>>>>> >> >>>>>>> >> >>>>>>> I understand. But with a large amount of hosts and working your >> >>>>>>> way through upgrades you sometimes run into these situations. >> >>>>>>> Therefor it >> >>>>>> would >> >>>>>>> be welcome if it works. >> >>>>>>> >> >>>>>>>> I do not have a solution for you. I have tried to hack >> >>>>>>>> /etc/libvirt/hooks/qemu but it didn't work. >> >>>>>>>> Have you tried with other cpu models like x86_Opteron_G5 ? you >> >>>>>>>> can find the cpu features of each cpu model in >> >>>>> /usr/share/libvirt/cpu_map/ >> >>>>>>>> >> >>>>>>> >> >>>>>>> I have not tried that yet, but I can see if that works. >> >>>>>>> >> >>>>>>> The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but >> >>>>>>> even >> >>>> using >> >>>>>>> that model we can't seem to migrate as it complains about the >> >> 'npt' >> >>>>>> feature. >> >>>>>>> >> >>>>>>> Wido >> >>>>>>> >> >>>>>>>> Anyway, even if the vm migration succeeds, you do not know if >> >>>>>>>> vm >> >>>>> works >> >>>>>>>> fine. I believe the best solution is upgrading all hosts to >> >>>>>>>> the >> >>>> same >> >>>>>>>> OS version. >> >>>>>>>> >> >>>>>>>> -Wei >> >>>>>>>> >> >>>>>>>> On Tue, 23 Nov 2021 at 16:31, Wido den Hollander >> >>>>>>>> <w...@widodh.nl> >> >>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Hi, >> >>>>>>>>> >> >>>>>>>>> I'm trying to debug an issue with live migrations between >> >>>>>>>>> Ubuntu >> >>>>>>>>> 18.04 and 20.04 machines each with different CPUs: >> >>>>>>>>> >> >>>>>>>>> - Ubuntu 18.04 with AMD Epyc 7552 (Rome) >> >>>>>>>>> - Ubuntu 20.04 with AMD Epyc 7662 (Milan) >> >>>>>>>>> >> >>>>>>>>> We are currently using this setting: >> >>>>>>>>> >> >>>>>>>>> guest.cpu.mode=custom >> >>>>>>>>> guest.cpu.model=EPYC >> >>>>>>>>> >> >>>>>>>>> This does not allow for live migrations: >> >>>>>>>>> >> >>>>>>>>> Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552 >> >>>>>>>>> fails >> >>>>>>>>> >> >>>>>>>>> "ExecutionException : org.libvirt.LibvirtException: >> >>>>>>>>> unsupported >> >>>>>>>>> configuration: unknown CPU feature: npt" >> >>>>>>>>> >> >>>>>>>>> So we tried to define a set of features manually: >> >>>>>>>>> >> >>>>>>>>> guest.cpu.features=3dnowprefetch abm adx aes apic arat avx >> >>>>>>>>> avx2 >> >>>> bmi1 >> >>>>>>>>> bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma >> >>>>>>>>> fpu fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx >> >>>>>>>>> mmxext monitor movbe msr mtrr nx osvw pae pat pclmuldq >> >>>>>>>>> pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdtscp sep >> >>>>>>>>> sha-ni smap smep sse >> >>>> sse2 >> >>>>>>>>> sse4.1 sse4.2 sse4a >> >>>>>>>>> ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt >> >>>> -x2apic >> >>>>>>>>> -hypervisor -topoext -nrip-save >> >>>>>>>>> >> >>>>>>>>> This results in this going into the XML: >> >>>>>>>>> >> >>>>>>>>> <feature policy='disable' name='npt'/> >> >>>>>>>>> >> >>>>>>>>> You would say that works, but then the target host (18.04 >> >>>>>>>>> with the >> >>>>>>>>> 7552) says it doesn't support the feature 'npt' and the >> >>>>>>>>> migration >> >>>>>> still >> >>>>>>> fails. >> >>>>>>>>> >> >>>>>>>>> Now we could ofcourse use the kvm64 CPU from Qemu, but that's >> >>>>> lacking >> >>>>>>>>> so many features that for example TLS offloading isn't >> >>> available. >> >>>>>>>>> >> >>>>>>>>> I also tried to set 'EPYC-Rome' on the Ubuntu 20.04 >> >>>>>>>>> hypervisor, >> >>>> but >> >>>>>>>>> it then complains on the Ubuntu 18.04 hypervisor that the CPU >> >>>>>>> 'EPYC-Rome' >> >>>>>>>>> is unknown as the 18.04 hypervisor doesn't have that profile. >> >>>>>>>>> >> >>>>>>>>> Any ideas on how to get this working? >> >>>>>>>>> >> >>>>>>>>> Wido >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> This message is confidential and may be legally privileged or >> >>>> otherwise >> >>>>>>> protected from disclosure. If you are not the intended >> >>>>>>> recipient, >> >>>>> please >> >>>>>>> telephone or email the sender and delete this message and any >> >>>>> attachment >> >>>>>>> from your system; you must not copy or disclose the contents of >> >>>>>>> this message or any attachment to any other person. We may >> >>>>>>> monitor email >> >>>>>> traffic >> >>>>>>> and the content of internal and external messages sent to and >> >>>>>>> from us >> >>>>> to >> >>>>>>> ensure compliance with internal policies and for the purposes of >> >>>>>> security. >> >>>>>>> >> >>>>>>> Ticketmaster UK Limited. Registered Office: 30 St John Street, >> >>>>>>> London >> >>>>>> EC1M >> >>>>>>> 4AY. Registered in England and Wales. Company Number 02662632. >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >