On Mon, Nov 04 2024, Daniel P. Berrangé <berra...@redhat.com> wrote:

> On Mon, Nov 04, 2024 at 04:10:12PM +0100, Cornelia Huck wrote:
>> On Mon, Nov 04 2024, Daniel P. Berrangé <berra...@redhat.com> wrote:
>> 
>> >
>> > FYI, in x86 target the -cpu command has had a "migratable=bool" property
>> > for a long time , which defaults to 'true' for 'host' model. This causes
>> > QEMU to explicitly drop features which would otherwise prevent migration
>> > between two hosts with identical physical CPUs.
>> >
>> > IOW, if there are some bits present in 'host' that cause migration
>> > problems on Ampere hosts, ideally either QEMU (or KVM kmod) would
>> > detect them and turn them off automatically if migratable=true is
>> > set. See commit message in 84f1b92f & 120eee7d1fd for some background
>> > info
>> 
>> How does this work for version-sensitive features -- are they always
>> defaulting to off? How many features are left with that in the end?
>
> Do you mean QEMU versions here ? The non-migratable feature list is
> just hardcoded in QEMU right now, and there's only 1 of them.
> eg grep for 'unmigratable_flags'
>
> Note, that "migratable" property is not defining a general purpose
> migration mask between different hw generations. It was specifically
> blocking just stuff that is known to make migration impossible, even
> if HW is identical on both sides.

I was more thinking of dependencies on the KVM version -- QEMU versions
are easier to control for, but you don't really know what kernel version
you are running with. In the end, we'd probably need to mark a lot of
things as unmigratable.

>
>> > NB "migratable" is defined in i386 target code today, but conceptually
>> > we should expand/move that to apply to all targets for consistency,
>> > even if it is effectively a no-op some targets (eg if they are
>> > guaranteed migratable out of the box already with '-cpu host').
>> 
>> How does this compare to s390x, which defines some migration-safe cpu
>> models, based upon the different hw generations? If I look at the QEMU
>> code for x86 and s390x, the s390x approach seems cleaner to me (probably
>> because it came later, and therefore could start afresh without having
>> to care for legacy things.) Given that we'll cook up a new model for Arm
>> migration as well, we might as well start with a clean implementation :)
>
> The impression I get (as an distant observer) is that CPUs on s390x in
> general have less complexity to worry about. A combination of not having
> a vendor who creates loads off different SKUs for the same CPU model
> family with slight variations between each, and also not seeming to have
> a situation where CPU flags a known to disappear (or appear) arbitrarily
> in microcode updates.
>
> The s390x idea of a "migratable" and "non migratable" model for each
> HW generation is a nice simplification, but I can't see how it could
> be made to work for x86 when you can't predict ahead of time what
> features are going to be removed from existing HW definition by the
> next microcode update, or by the next CPU SKU that removes a feature
> you had assumed would always be present in a given HW generation.
>
> I don't know much about how ARM world works, but having lots of vendors
> competing with their own custom impls makes me worry complexity will
> be closer to x86 than to s390.

My concern was more about code complexity, not hw complexity. We'll
probably end up with a zoo of weird creatures for Arm, but I don't see a
reason why the code would need to have strange things tacked
on. I.e. have a set of arch extensions that you can baseline to, and
have individual cpus on top, so you can deal with both well-known cpus
and more boutique ones.

>
> If the ARM specifications define a minimum require featureset for each
> HW generation, maybe you can define a model based on that ? You might
> still want to have vendor specific models though, if there are compelling
> features they expose which are optional, or non-standardized. 

We have a list of features that are optional for a given arch extension,
and a list of features that are mandatory, so I think we'd be able to
generate a model with the mandatory features only. Models for individual
cpus could base off these. (There are currently 13 vendors defined in
MIDR, but I'm not sure how often new vendors might be added, and vendors
may also be more or less active.) If you have a baseline of Arm v9.2 or
so, that might already go a long way.

[But I obviously have no idea how well that will work when it meats
reality :)]


Reply via email to