On 12/3/24 06:56, Alex Bennée wrote:
Pierrick Bouvier <pierrick.bouv...@linaro.org> writes:
This boot an OP-TEE environment, and launch a nested guest VM inside it
using the Realms feature. We do it for virt and sbsa-ref platforms.
Signed-off-by: Pierrick Bouvier <pierrick.bouv...@linaro.org>
<snip>
+
+ self.vm.add_args('-accel', 'tcg')
+ self.vm.add_args('-cpu', 'max,x-rme=on')
With debug on the PAC function are certainly very high in the perf
report. So pauth-impdef=on seems worthwhile here.
+ self.vm.add_args('-m', '2G')
+ self.vm.add_args('-M', 'sbsa-ref')
+ self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
+ self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
+ self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
+ self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
+ self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
+ self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
+ self.vm.add_args('-fsdev',
f'local,security_model=none,path={rme_stack},id=shr0')
+ self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
+ self.vm.add_args('-netdev', 'user,id=net0')
<snip>
+
+ self.vm.add_args('-accel', 'tcg')
+ self.vm.add_args('-cpu', 'max,x-rme=on')
And here.
<snip>
With that the tests both pass with --enable-debug (312s, 352s) and the
profile looks like:
6.33% qemu-system-aar qemu-system-aarch64 [.]
arm_feature
5.66% qemu-system-aar qemu-system-aarch64 [.]
tcg_flush_jmp_cache
3.44% qemu-system-aar qemu-system-aarch64 [.]
rebuild_hflags_a64
This I suspect is triggered by assert_hflags_rebuild_correctly() which
is validating we've not skipped rebuilding the flags when we need to.
It's a lot easier than debugging why your execution trace looks weird.
2.95% qemu-system-aar qemu-system-aarch64 [.]
extract64
2.52% qemu-system-aar qemu-system-aarch64 [.]
extract64
This is usually triggered by translation code which uses extract64
heavily during instruction decode.
It might be useful to see if we can get functional tests run under TCG
to dump "info jit" at the end and ensure we are not over generating code
and exhausting the translation cache.
2.12% qemu-system-aar qemu-system-aarch64 [.]
arm_el_is_aa64
2.11% qemu-system-aar qemu-system-aarch64 [.]
arm_security_space_below_el3
2.11% qemu-system-aar qemu-system-aarch64 [.]
deposit64
1.49% qemu-system-aar qemu-system-aarch64 [.]
arm_hcr_el2_eff_secstate
1.46% qemu-system-aar qemu-system-aarch64 [.]
arm_is_el2_enabled_secstate
1.38% qemu-system-aar qemu-system-aarch64 [.]
extract32
1.34% qemu-system-aar qemu-system-aarch64 [.]
extract64
1.30% qemu-system-aar qemu-system-aarch64 [.]
get_phys_addr_lpae
1.23% qemu-system-aar qemu-system-aarch64 [.]
aa64_va_parameters
1.09% qemu-system-aar qemu-system-aarch64 [.] rol32
1.07% qemu-system-aar qemu-system-aarch64 [.]
probe_access_internal
1.02% qemu-system-aar qemu-system-aarch64 [.]
deposit32
Thanks Alex.
I did the same investigation, and switching to pauth-impdef brings down
time from 1500s to a more "acceptable" 450s on my machine.
In my profile (using call graphs, which I'm not sure you used), I
observe that 26% of the time is spent in
assert_hflags_rebuild_correctly, which is enabled by --enable-debug-tcg.
I'll send a v3 switching to impdef and increasing the timeout, should be
enough for this time.
Pierrick