Op 11-12-2021 om 00:52 schreef Marcus:
Just for clarity - Wido you mention that you tried using a common CPU model
across the platforms (which presumably doesn't contain npt) but migration
still fails on npt missing. That does seem like a bug of some sort, I would
expect that the the following should work:
Indeed, that failed.
* Update cloudstack agent configs to use 'EPYC-IBPB' common identical
model, restart agent
* Stop VM on source host (ubuntu 20.04)
* Start VM on source host (ubuntu 20.04) - at this point you should not
have a feature 'npt' in the XML of the running VM. If you do then there's
something wrong with the EPYC-IBPB or libvirt's interpretation
* Attempt to migrate to destination host (ubuntu 18.04)
Is this process failing? Just want to ensure the source VM was restarted
and does not contain npt in the XML (and also on the resulting qemu command
line), but still the migration complains about missing that feature.
I tried with EPYC-IBPB as well and restarted the VM prior to the migration.
20.04 -> 18.04 fails even though the IBPB model in libvirt is exactly
the same between 18 and 20.
It complains about the npt feature lacking and thus the migration fails.
I'm also making an assumption here that /proc/cpuinfo on an Epyc 7552 does
not have npt, but an Epyc 7662 does. Is that correct?
Correct.
On Tue, Dec 7, 2021 at 6:46 AM Gabriel Bräscher <gabrasc...@gmail.com>
wrote:
Paul, I confused the issues then.
The one I mentioned fits only with what Wido reported in this thread.
The CPU flag matches with the ones raised on that bug. Flags like *npt* &
*nrip-save* which are present when SVM is enabled.
Therefore, affected by kernel commit -- 52297436199d ("kvm: svm: Update
svm_xsaves_supported").
Additionally, the OS/Qemu versions also do fit with what is reported on
Ubuntu' qemu package "bug #1887490".
Regards
On Tue, Dec 7, 2021 at 12:10 PM Paul Angus <p...@angus.uk.com.invalid>
wrote:
The qemu-ev 2.10 bug was first reported a year or two ago in the mailing
lists.
-----Original Message-----
From: Gabriel Bräscher <gabrasc...@gmail.com>
Sent: Tuesday, December 7, 2021 9:41 AM
To: dev <dev@cloudstack.apache.org>
Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and 20.04
Just adding to the "qemu-ev 2.10" & "qemu-ev 2.12" point.
migration fails from qemu-ev 2.10 to qemu-ev 2.12, this is definitely
a bug in my point of view.
On the comment 53 (at "bug #1887490"):
It seems *one of the patches also introduced a regression*:
* lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patch
adds various SVM-related flags. Specifically *npt and nrip-save are
now expected to be present by default* as shown in the updated
testdata.
This however breaks migration from instances using EPYC or EPYC-IBPB
CPU models started with libvirt versions prior to this one because the
instance on the target host has these extra flags
From the tests reported there, it fails in both ways.
1. From *older* qemu package to *newer*:
*source* host does not map the CPU flag; however, *target* host
expects the flag to be there, by default.
2. From *newer* qemu package to *older*:
the instance "domain.xml" in the *source* host has a CPU flag that is
not mapped by qemu in the *target* host.
On Tue, Dec 7, 2021 at 10:22 AM Sven Vogel <s.vo...@ewerk.com> wrote:
Let me check. We had the same problem on RHEL/CentOS but I am not sure
if this a bug. What I know there was a change in the XML. Let me ask
one on my colleges in my team.
😉
__
Sven Vogel
Senior Manager Research and Development - Cloud and Infrastructure
EWERK DIGITAL GmbH
Brühl 24, D-04109 Leipzig
P +49 341 42649 - 99
F +49 341 42649 - 98
s.vo...@ewerk.com
www.ewerk.com
Geschäftsführer:
Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
Registergericht: Leipzig HRB 9065
Support:
+49 341 42649 555
Zertifiziert nach:
ISO/IEC 27001:2013
DIN EN ISO 9001:2015
DIN ISO/IEC 20000-1:2018
ISAE 3402 Typ II Assessed
EWERK-Blog<https://blog.ewerk.com/> | LinkedIn<
https://www.linkedin.com/company/ewerk-group> | Xing<
https://www.xing.com/company/ewerk> | Twitter<
https://twitter.com/EWERK_Group> | Facebook<
https://de-de.facebook.com/EWERK.Group/>
Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
Disclaimer Privacy:
Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien)
ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht
der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
informieren Sie in diesem Fall unverzüglich den Absender und löschen
Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem
System.
Vielen Dank.
The contents of this e-mail (including any attachments) are
confidential and may be legally privileged. If you are not the
intended recipient of this e-mail, any disclosure, copying,
distribution or use of its contents is strictly prohibited, and you
should please notify the sender immediately and then delete it
(including any attachments) from your system. Thank you.
Von: Gabriel Bräscher <gabrasc...@gmail.com>
Datum: Dienstag, 7. Dezember 2021 um 09:57
An: dev <dev@cloudstack.apache.org>
Betreff: Re: Live migration between AMD Epyc and Ubuntu 18.04 and
20.04 Wei, I agree.
This is not necessarily a bug per se.
The main point here is: the issue we are seeing is the "bug #1887490"
raised in Ubuntu's qemu package.
CPU features were added on the newer releases, which caused the
compatibility issue when (live) migrating VMs between compatible
hardware but different qemu packages.
On Tue, Dec 7, 2021 at 9:26 AM Wei ZHOU <ustcweiz...@gmail.com> wrote:
Hi Gabriel,
In my opinion, migration should work from lower version to higher
version,
but no guarantee from higher version to lower version, like we
upgrade cloudstack.
Therefore, migrate should work from ubuntu 18.04 to ubuntu 20.04.
But it
is
not a bug if migration fails from ubuntu 20.04 to ubuntu 18.04.
As Paul said, migration fails from qemu-ev 2.10 to qemu-ev 2.12,
this is definitely a bug in my point of view.
-Wei
On Mon, 6 Dec 2021 at 16:05, Gabriel Bräscher <gabrasc...@gmail.com>
wrote:
Hi Paul (& all),
I strongly believe that this is a bug in QEMU.
I was looking for bugs and found something that looks related to
what
we
are seeing. Precisely at Ubuntu's bug #*1887490*
<https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490
In the link above, there was the following comment:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/5
3
It seems one of the patches also introduced a regression:*
lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various
SVM-related flags. Specifically npt and nrip-save are now expected
to
be
present by default as shown in the updated testdata.This however
breaks migration from instances using *EPYC* or *EPYC-IBPB* CPU
models started with libvirt versions prior to this one because the
instance on the
target
host has these extra flags
More about #*1887490*
<https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can
be
found
at the mail
https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html.
We can see that the specific bug was addressed in "linux
(5.4.0-49.53) focal".
linux (5.4.0-49.53) focal; urgency=medium
* Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
- kvm: svm: Update svm_xsaves_supported
Regards,
Gabriel.
On Fri, Dec 3, 2021 at 10:59 AM Paul Angus <
paul.an...@ticketmaster.com>
wrote:
Which version(s) of QEMU are you using Wido?
We've just be upgrading CentOS 7.6 to 7.9 Most 7.6 hosts had
qemu-ev 2.10 on it (the buggy one). 2.12 was on
the
new hosts.
We were getting errors complaining that the ibpb CPU feature
wasn't available when migrating to the updated OS hosts (even
though
identical
hardware).
Upgrading qemu-ev to 2.12 on the originating host, then stopping
and starting the VMs, then allowed us to migrate. We couldn't
find any solution that didn't involve stopping and starting the
VMs.
Paul.
-----Original Message-----
From: Wido den Hollander <w...@widodh.nl>
Sent: Monday, November 29, 2021 7:57 AM
To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com>
Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04
and
20.04
On 11/24/21 10:36 PM, Wei ZHOU wrote:
Hi Wido,
I think it is not good to run an environment with two
ubuntu/qemu
versions.
It always happens that some cpu features are supported in the
higher
version but not supported in the older version.
From my experience, the migration from older version to higher
version
works like a charm, but there were many issues in migration
from higher version to older version.
I understand. But with a large amount of hosts and working your
way through upgrades you sometimes run into these situations.
Therefor it
would
be welcome if it works.
I do not have a solution for you. I have tried to hack
/etc/libvirt/hooks/qemu but it didn't work.
Have you tried with other cpu models like x86_Opteron_G5 ? you
can find the cpu features of each cpu model in
/usr/share/libvirt/cpu_map/
I have not tried that yet, but I can see if that works.
The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but
even
using
that model we can't seem to migrate as it complains about the
'npt'
feature.
Wido
Anyway, even if the vm migration succeeds, you do not know if
vm
works
fine. I believe the best solution is upgrading all hosts to
the
same
OS version.
-Wei
On Tue, 23 Nov 2021 at 16:31, Wido den Hollander
<w...@widodh.nl>
wrote:
Hi,
I'm trying to debug an issue with live migrations between
Ubuntu
18.04 and 20.04 machines each with different CPUs:
- Ubuntu 18.04 with AMD Epyc 7552 (Rome)
- Ubuntu 20.04 with AMD Epyc 7662 (Milan)
We are currently using this setting:
guest.cpu.mode=custom
guest.cpu.model=EPYC
This does not allow for live migrations:
Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552
fails
"ExecutionException : org.libvirt.LibvirtException:
unsupported
configuration: unknown CPU feature: npt"
So we tried to define a set of features manually:
guest.cpu.features=3dnowprefetch abm adx aes apic arat avx
avx2
bmi1
bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma
fpu fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx
mmxext monitor movbe msr mtrr nx osvw pae pat pclmuldq
pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdtscp sep
sha-ni smap smep sse
sse2
sse4.1 sse4.2 sse4a
ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt
-x2apic
-hypervisor -topoext -nrip-save
This results in this going into the XML:
<feature policy='disable' name='npt'/>
You would say that works, but then the target host (18.04
with the
7552) says it doesn't support the feature 'npt' and the
migration
still
fails.
Now we could ofcourse use the kvm64 CPU from Qemu, but that's
lacking
so many features that for example TLS offloading isn't
available.
I also tried to set 'EPYC-Rome' on the Ubuntu 20.04
hypervisor,
but
it then complains on the Ubuntu 18.04 hypervisor that the CPU
'EPYC-Rome'
is unknown as the 18.04 hypervisor doesn't have that profile.
Any ideas on how to get this working?
Wido
This message is confidential and may be legally privileged or
otherwise
protected from disclosure. If you are not the intended
recipient,
please
telephone or email the sender and delete this message and any
attachment
from your system; you must not copy or disclose the contents of
this message or any attachment to any other person. We may
monitor email
traffic
and the content of internal and external messages sent to and
from us
to
ensure compliance with internal policies and for the purposes of
security.
Ticketmaster UK Limited. Registered Office: 30 St John Street,
London
EC1M
4AY. Registered in England and Wales. Company Number 02662632.