Hi Paul (& all),

I strongly believe that this is a bug in QEMU.
I was looking for bugs and found something that looks related to what we
are seeing. Precisely at Ubuntu's bug #*1887490*
<https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490

In the link above, there was the following comment:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/53

It seems one of the patches also introduced a regression:*
lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various
SVM-related flags. Specifically npt and nrip-save are now expected to be
present by default as shown in the updated testdata.This however breaks
migration from instances using *EPYC* or *EPYC-IBPB* CPU models started
with libvirt versions prior to this one because the instance on the target
host has these extra flags


More about #*1887490*
<https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can be found
at the mail
https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html.
We can see that the specific bug was addressed in "linux (5.4.0-49.53)
focal".

linux (5.4.0-49.53) focal; urgency=medium

  * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
    - kvm: svm: Update svm_xsaves_supported


Regards,
Gabriel.

On Fri, Dec 3, 2021 at 10:59 AM Paul Angus <paul.an...@ticketmaster.com>
wrote:

> Which version(s) of QEMU are you using Wido?
>
> We've just be upgrading CentOS 7.6 to 7.9
> Most 7.6 hosts had qemu-ev 2.10 on it  (the buggy one). 2.12 was on the
> new hosts.
> We were getting errors complaining that the ibpb CPU feature wasn't
> available when migrating to the updated OS hosts (even though identical
> hardware).
>
> Upgrading qemu-ev to 2.12 on the originating host, then stopping and
> starting the VMs, then allowed us to migrate.  We couldn't find any
> solution that didn't involve stopping and starting the VMs.
>
> Paul.
>
> -----Original Message-----
> From: Wido den Hollander <w...@widodh.nl>
> Sent: Monday, November 29, 2021 7:57 AM
> To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com>
> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and 20.04
>
>
>
> On 11/24/21 10:36 PM, Wei ZHOU wrote:
> > Hi Wido,
> >
> > I think it is not good to run an environment with two ubuntu/qemu
> versions.
> > It always happens that some cpu features are supported in the higher
> > version but not supported in the older version.
> > From my experience, the migration from older version to higher version
> > works like a charm, but there were many issues in migration from
> > higher version to older version.
> >
>
> I understand. But with a large amount of hosts and working your way
> through upgrades you sometimes run into these situations. Therefor it would
> be welcome if it works.
>
> > I do not have a solution for you. I have tried to hack
> > /etc/libvirt/hooks/qemu but it didn't work.
> > Have you tried with other cpu models like x86_Opteron_G5 ? you can
> > find the cpu features of each cpu model in /usr/share/libvirt/cpu_map/
> >
>
> I have not tried that yet, but I can see if that works.
>
> The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but even using
> that model we can't seem to migrate as it complains about the 'npt' feature.
>
> Wido
>
> > Anyway, even if the vm migration succeeds, you do not know if vm works
> > fine. I believe the best solution is upgrading all hosts to the same
> > OS version.
> >
> > -Wei
> >
> > On Tue, 23 Nov 2021 at 16:31, Wido den Hollander <w...@widodh.nl> wrote:
> >
> >> Hi,
> >>
> >> I'm trying to debug an issue with live migrations between Ubuntu
> >> 18.04 and 20.04 machines each with different CPUs:
> >>
> >> - Ubuntu 18.04 with AMD Epyc 7552 (Rome)
> >> - Ubuntu 20.04 with AMD Epyc 7662 (Milan)
> >>
> >> We are currently using this setting:
> >>
> >> guest.cpu.mode=custom
> >> guest.cpu.model=EPYC
> >>
> >> This does not allow for live migrations:
> >>
> >> Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552 fails
> >>
> >> "ExecutionException : org.libvirt.LibvirtException: unsupported
> >> configuration: unknown CPU feature: npt"
> >>
> >> So we tried to define a set of features manually:
> >>
> >> guest.cpu.features=3dnowprefetch abm adx aes apic arat avx avx2 bmi1
> >> bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma fpu
> >> fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx mmxext
> >> monitor movbe msr mtrr nx osvw pae pat pclmuldq pdpe1gb pge pni
> >> popcnt pse pse36 rdrand rdseed rdtscp sep sha-ni smap smep sse sse2
> >> sse4.1 sse4.2 sse4a
> >> ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt -x2apic
> >> -hypervisor -topoext -nrip-save
> >>
> >> This results in this going into the XML:
> >>
> >> <feature policy='disable' name='npt'/>
> >>
> >> You would say that works, but then the target host (18.04 with the
> >> 7552) says it doesn't support the feature 'npt' and the migration still
> fails.
> >>
> >> Now we could ofcourse use the kvm64 CPU from Qemu, but that's lacking
> >> so many features that for example TLS offloading isn't available.
> >>
> >> I also tried to set 'EPYC-Rome' on the Ubuntu 20.04 hypervisor, but
> >> it then complains on the Ubuntu 18.04 hypervisor that the CPU
> 'EPYC-Rome'
> >> is unknown as the 18.04 hypervisor doesn't have that profile.
> >>
> >> Any ideas on how to get this working?
> >>
> >> Wido
> >>
> >
> This message is confidential and may be legally privileged or otherwise
> protected from disclosure. If you are not the intended recipient, please
> telephone or email the sender and delete this message and any attachment
> from your system; you must not copy or disclose the contents of this
> message or any attachment to any other person. We may monitor email traffic
> and the content of internal and external messages sent to and from us to
> ensure compliance with internal policies and for the purposes of security.
>
> Ticketmaster UK Limited. Registered Office: 30 St John Street, London EC1M
> 4AY. Registered in England and Wales. Company Number 02662632.
>

Reply via email to