> On 31. Jul 2025, at 15:37, Peter Maydell <peter.mayd...@linaro.org> wrote:
> 
> On Wed, 30 Jul 2025 at 21:52, Fabiano Rosas <faro...@suse.de> wrote:
>> 
>> Currently our aarch64 tests are only being run using identical QEMU
>> versions. When running the tests with different QEMU versions, which
>> is a common use-case for migration, the tests are broken due to the
>> current choice of the 'max' cpu, which is not stable and is prone to
>> breaking migration.
>> 
>> This means aarch64 tests are currently only testing about the same
>> situations as any other arch, i.e. no arm-specific testing is being
>> done.
>> 
>> To make the aarch64 tests more useful, -cpu max will be changed to
>> -cpu neoverse-n1 in the next patch. Before doing that, make sure
>> aarch64 tests only run with TCG, since KVM testing depends on usage of
>> the -cpu host and we currently don't have code to switch between cpus
>> according to test runtime environment.
>> 
>> Also, TCG alone should allow us to catch most issues with migration,
>> since there is no guarantee of a uniform environment as there is with
>> KVM.
> 
> The difficulty with only testing TCG migration is that now
> we're testing the setup that most cross-versions migration users
> don't care about. At least my assumption is that it's KVM
> cross-version migration that is the real use case here.
> 
> For instance, this migration bug with the DBGDTR register
> isn't a problem for KVM, because with KVM we use the kernel
> to tell us what system registers are present, and whether
> a register is defined with a cpreg in QEMU or not doesn't
> affect what we put on the wire for migration. Conversely
> there might be migration compat issues that show up only
> with KVM and not TCG (though the most obvious source of those
> would be host kernel changes, which is kind of out of scope
> for us).
> 
> Though of course with our CI jobs we're probably not
> doing AArch64 KVM cross-version testing anyway...
> 
On the cloud provider side*, we do rely on having rollbacks work.

We rely on staged deployments with rolling back if things go wrong
as we observe progress.

Note that the set of MSRs KVM gives (at least on AArch64) does sometimes
vary between releases so for rolling back you’ll need to ignore some (new) 
sysregs in the vmm. With careful planning so that you deploy a VMM
release with a point-fix to ignore the new registers and then the kernel update.

So not dealing with that would make the cloud use case not usable without 
downstream patches.

*although we don’t rely on Qemu for Nitro System VMs

Thank you,
> thanks
> -- PMM
> 


Reply via email to