On 12/17/24 02:38, Peter Maydell wrote:
On Tue, 17 Dec 2024 at 07:40, Alex Bennée <alex.ben...@linaro.org> wrote:
Pierrick Bouvier <pierrick.bouv...@linaro.org> writes:
On 12/16/24 11:50, Richard Henderson wrote:
On 12/16/24 13:26, Pierrick Bouvier wrote:
On 12/16/24 11:10, Richard Henderson wrote:
On 12/4/24 15:12, Pierrick Bouvier wrote:
qemu-system-aarch64 default pointer authentication (QARMA5) is expensive, we
spent up to 50% of the emulation time running it (when using TCG).
Switching to pauth-impdef=on is often given as a solution to speed up execution.
Thus we talked about making it the new default.
The first patch introduce a new property (pauth-qarma5) to allow to select
current default algorithm.
The second one change the default.
Pierrick Bouvier (2):
target/arm: add new property to select pauth-qarma5
target/arm: change default pauth algorithm to impdef
docs/system/arm/cpu-features.rst | 7 +++++--
docs/system/introduction.rst | 2 +-
target/arm/cpu.h | 1 +
target/arm/arm-qmp-cmds.c | 2 +-
target/arm/cpu64.c | 30 +++++++++++++++++++-----------
tests/qtest/arm-cpu-features.c | 15 +++++++++++----
6 files changed, 38 insertions(+), 19 deletions(-)
I understand the motivation, but as-is this will break migration.
I think this will need to be versioned somehow, but the only thing that really
gets
versioned are the boards, and I'm not sure how to link that to the instantiated
cpu.
From what I understood, and I may be wrong, the use case to migrate (tcg) vm
with cpu max
between QEMU versions is *not* supported, as we can't guarantee which features
are present
or not.
This doesn't affect only -cpu max, but anything using
aarch64_add_pauth_properties():
neoverse-n1, neoverse-n2, cortex-a710.
I think this is still a change worth to do, because people can get a
100% speedup with this simple change, and it's a better default than
the previous value.
In more, in case of this migration scenario, QEMU will immediately
abort upon accessing memory through a pointer.
I'm not sure about what would be the best way to make this change as
smooth as possible for QEMU users.
Surely we can only honour and apply the new default to -cpu max?
With all my respect, I think the current default is wrong, and it would
be sad to keep it when people don't precise cpu max, or for other cpus
enabling pointer authentication.
In all our conversations, there seems to be a focus on choosing the
"fastest" emulation solution that satisfies the guest (behaviour wise).
And, for a reason I ignore, pointer authentication escaped this rule.
I understand the concern regarding retro compatibility, but it would be
better to ask politely (with an error message) to people to restart
their virtual machines when they try to migrate, instead of being stuck
with a slow default forever.
In more, we are talking of a tcg scenario, for which I'm not sure people
use migration feature (save/restore) heavily, but I may be wrong on this.
Between the risk of breaking migration (with a polite error message),
and having a default that is 100% faster, I think it would be better to
favor the second one. If it would be a 5% speedup, I would not argue,
but slowing down execution with a factor of 2 is really a lot.
That was what I thought we were aiming for, yes. We *could* have
a property on the CPU to say "use the old back-compatible default,
not the new one", which we then list in the appropriate hw_compat
array. (Grep for the "backcompat-cntfrq" property for an example of
this.) But I'm not sure if that is worth the effort compared to
just changing 'max'.
When we'll define hw_compat_10_0, and hw_compat_11_0, do we have to
carry this on forever? (Same question for "backcompat-cntfrq").
(It's not that much extra code to add the property, so I could
easily be persuaded the other way. Possible arguments include
preferring consistency across all CPUs. If we already make the
default be not "what the real CPU of this type uses" then that's
also an argument that we can set it to whatever is convenient;
if we do honour the CPU ID register values for the implementation
default then that's an argument that we should continue to do
so and not change the default to our impdef one.)
For the TCG use case, is there any visible side effect for the guest to
use any specific pointer authentication algorithm?
In other words, is there a scenario where pointer authentication would
work with impdef, but not with qarma{3,5}?
If no, I don't see any reason for a cpu to favor an expensive emulation.
In the accelerator case, we read the values from the host cpu, so there
is no problem.
-- PMM