On Tue, 29 Apr 2025, Alex Bennée wrote:
Richard Henderson <richard.hender...@linaro.org> writes:
On 4/29/25 08:27, Alex Bennée wrote:
- 45.16% rr_cpu_thread_fn
Hmm you seem to be running in icount mode here for some reason.
For some reason ppc32 does not enable mttcg.
I'm not sure what's missing to enable it properly.
I seem to recall it may have been reverted due to instability but I
can't find the commit.
Or maybe it was never enabled? We've recently tried mttcg with G4 mac99
machine and it seems to work but the needed patches were not cleaned up
for upstream yet so they are using a fork for that now. But that's a
digression.
I've tried to rerun the benchmark with qemu-system-ppc64 instead of
qemu-system-ppc (no other change in the command) and it did not seem to
help much, it's still slow. Here's the profile:
Children Self Command Shared Object Symbol
- 99.42% 0.78% qemu-system-ppc qemu-system-ppc64 [.]
cpu_exec_loop
- 99.32% cpu_exec_loop
- 99.32% cpu_tb_exec
- 91.29% 0x7f25d079f8b4
helper_ldub_mmu
do_ld_mmio_beN
- cpu_io_recompile
- 49.05% mttcg_cpu_thread_fn
- 49.05% tcg_cpu_exec
- 49.05% cpu_exec
- 49.04% cpu_exec_setjmp
- cpu_exec_loop
- 49.03% cpu_tb_exec
38.92% 0x7f25cf3f0000
- 0.63% 0x7f25fe78bd93
helper_VPERM
- 0.61% 0x7f25fe78bed8
helper_VPERM
- 42.24% cpu_loop_exit_noexc
cpu_loop_exit
__longjmp_chk
cpu_exec_setjmp
- cpu_exec_loop
- 42.23% cpu_tb_exec
38.67% 0x7f25cf3f0000
- 0.62% 0x7f25fe78bd93
helper_VPERM
- 0.60% 0x7f25fe78bed8
helper_VPERM
- 5.78% 0x7f25d0625055
helper_raise_exception
mttcg_cpu_thread_fn
tcg_cpu_exec
cpu_exec
cpu_exec_setjmp
cpu_exec_loop
cpu_tb_exec
0x7f25d0625055
helper_raise_exception
mttcg_cpu_thread_fn
tcg_cpu_exec
cpu_exec
cpu_exec_setjmp
cpu_exec_loop
- cpu_tb_exec
- 5.78% 0x7f25d0625055
- helper_raise_exception
- 5.49% mttcg_cpu_thread_fn
- 5.16% tcg_cpu_exec
- 5.11% cpu_exec
- 5.03% cpu_exec_setjmp
- 5.01% cpu_exec_loop
- 4.27% cpu_tb_exec
1.60% 0x7f25cf3f0000
+ 99.41% 0.25% qemu-system-ppc qemu-system-ppc64 [.] cpu_tb_exec
+ 99.41% 0.01% qemu-system-ppc qemu-system-ppc64 [.]
cpu_exec_setjmp
+ 98.02% 0.17% qemu-system-ppc qemu-system-ppc64 [.] cpu_exec
+ 97.99% 0.02% qemu-system-ppc qemu-system-ppc64 [.] tcg_cpu_exec
+ 97.98% 0.05% qemu-system-ppc qemu-system-ppc64 [.]
mttcg_cpu_thread_fn
+ 92.38% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
cpu_io_recompile
+ 91.54% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
do_ld_mmio_beN
+ 91.51% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
helper_ldub_mmu
+ 91.49% 0.00% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25d079f8b4
+ 81.15% 0.00% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25cf3f0000
+ 44.70% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
cpu_loop_exit
+ 44.50% 0.01% qemu-system-ppc libc.so.6 [.]
__longjmp_chk
+ 43.16% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
cpu_loop_exit_noexc
+ 9.57% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
helper_raise_exception
+ 8.02% 0.08% qemu-system-ppc qemu-system-ppc64 [.]
notdirty_write.isra.0
+ 7.60% 0.05% qemu-system-ppc qemu-system-ppc64 [.] mmu_lookup
+ 7.50% 0.03% qemu-system-ppc qemu-system-ppc64 [.]
tb_invalidate_phys_range_fast
+ 7.34% 0.05% qemu-system-ppc qemu-system-ppc64 [.] do_st4_mmu
+ 7.18% 0.02% qemu-system-ppc qemu-system-ppc64 [.]
mmu_watch_or_dirty
+ 6.99% 6.99% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7bba4b
+ 6.82% 6.82% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7c6545
+ 6.01% 6.01% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7bbac9
+ 5.94% 5.94% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7bbb47
+ 5.90% 5.90% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7bb968
+ 5.85% 0.00% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25d0625055
+ 5.45% 1.17% qemu-system-ppc qemu-system-ppc64 [.]
page_collection_lock
+ 5.13% 5.13% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7c654b
+ 5.08% 5.08% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe71f74b
+ 5.07% 5.07% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7c624f
+ 5.05% 5.05% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe7c6249
+ 4.93% 4.93% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe71f740
+ 4.64% 4.64% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe71f890
+ 4.49% 4.49% qemu-system-ppc [JIT] tid 12410 [.]
0x00007f25fe71f885
+ 4.05% 1.51% qemu-system-ppc qemu-system-ppc64 [.]
page_trylock_add
+ 3.64% 3.62% qemu-system-ppc qemu-system-ppc64 [.] helper_VPERM
+ 2.43% 1.40% qemu-system-ppc qemu-system-ppc64 [.] probe_access
+ 2.16% 0.51% qemu-system-ppc libglib-2.0.so.0.7600.3 [.]
g_tree_lookup
+ 2.09% 0.00% qemu-system-ppc qemu-system-ppc64 [.]
cpu_loop_exit_restore
+ 1.66% 0.06% qemu-system-ppc qemu-system-ppc64 [.]
helper_store_msr
+ 1.61% 0.12% qemu-system-ppc qemu-system-ppc64 [.]
hreg_store_msr
+ 1.52% 1.52% qemu-system-ppc qemu-system-ppc64 [.]
tb_invalidate_phys_page_range__locked.constprop.0
+ 1.49% 0.05% qemu-system-ppc qemu-system-ppc64 [.] dcbz_common
The times with 100 iterations were:
mapping 0x80800000
src 0xb773a008 dst 0xb7638000
byte loop: 6.49 sec
memset: 0.44 sec
memcpy: 1.6 sec
copyToVRAMNoAltivec: 0.8 sec
copyToVRAMAltivec: 0.88 sec
copyFromVRAMNoAltivec: 8.15 sec
copyFromVRAMAltivec: 8.41 sec
(previous results were with 10000 iterations but I did not rerun that now,
I assume we can roughly take 100 times these results to compare to that.
Then this may be even slower with qemu-system-ppc64 which can be as some
code is compiled out without TARGET_PPC64 defined.)
I try to investigate more but I'm still quite lost.
Regards,
BALATON Zoltan