tcg: Inline cpu_mmu_lookup function

Max Chou Sat, 17 Feb 2024 09:28:50 -0800

Hi Richard,

After I tried to reduce the overhead of unnecessary segment flow andmemory plugin callbacks, I observed that there several functions consumemost of runtime from perf report.So these three inline patches and previous patch were created to reducethe overhead without modifying the functions.


The following are the experiment results.

The benchmark target is the bench-memcpy executable generated from theglibc repository (release 2.38 with RVV support patch [1]).


- Execution command

`qemu-riscv64 -E TIMEOUTFACTOR=10 -R 1G -L {glibc build folder}/rootfs-cpurv64,zba=true,zbb=true,v=true,vlen=256,vext_spec=v1.0,rvv_ta_all_1s=true,rvv_ma_all_1s=truebench-memcpy`



- Total  runtime

0. Original riscv-to-apply.next branch (commit ID: deb0ff0)
    - Total execution time: ~383 sec.
1. Cherry pick PATCH 4 to riscv-to-apply.next branch (commit ID: deb0ff0)
    - Total execution time: ~375 sec.

2. Cherry pick PATCH 4+5+6 to riscv-to-apply.next branch (commit ID:deb0ff0)

    - Total execution time: ~342 sec.


- Perf report (cycles)

0. Original riscv-to-apply.next branch (commit ID: deb0ff0)

# Samples: 385K of event 'cycles:u'
# Event count (approx.): 1411574690539
#
# Children      Self  Command       Shared Object Symbol

# ........ ........ ............ ...................................................................

47.54% 31.35% qemu-riscv64 qemu-riscv64 [.]vext_ldst_us 25.60% 0.03% qemu-riscv64 qemu-riscv64 [.]helper_vse8_v 18.87% 13.29% qemu-riscv64 qemu-riscv64 [.]cpu_ldb_mmu 17.40% 0.04% qemu-riscv64 qemu-riscv64 [.]helper_vle8_v 17.39% 15.71% qemu-riscv64 qemu-riscv64 [.]cpu_stb_mmu 17.17% 17.17% qemu-riscv64 qemu-riscv64 [.]qemu_plugin_vcpu_mem_cb 12.25% 0.00% qemu-riscv64 qemu-riscv64 [.]helper_stb_mmu (inlined)

     8.18%     4.08%  qemu-riscv64  qemu-riscv64             [.] lde_b

8.17% 8.17% qemu-riscv64 qemu-riscv64 [.]cpu_mmu_lookup 7.45% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_load_cb (inlined) 7.32% 0.00% qemu-riscv64 qemu-riscv64 [.]do_st1_mmu (inlined) 6.79% 0.00% qemu-riscv64 [unknown] [.]0x00000000000000ff 5.91% 0.00% qemu-riscv64 qemu-riscv64 [.]adjust_addr (inlined) 5.70% 0.00% qemu-riscv64 qemu-riscv64 [.]do_ld1_mmu (inlined) 5.58% 0.00% qemu-riscv64 [unknown] [.]0x00000000000001fe 5.23% 4.25% qemu-riscv64 qemu-riscv64 [.]cpu_ldsb_data_ra 4.93% 0.00% qemu-riscv64 qemu-riscv64 [.]get_memop (inlined) 4.11% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_data_ra (inlined) 4.11% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_mmuidx_ra (inlined) 2.88% 2.88% qemu-riscv64 qemu-riscv64 [.]cpu_stb_data_ra 2.88% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_stb_mmuidx_ra (inlined) 2.75% 0.00% qemu-riscv64 qemu-riscv64 [.] stb_p(inlined) 2.66% 0.00% qemu-riscv64 qemu-riscv64 [.]get_alignment_bits (inlined) 1.79% 0.00% qemu-riscv64 [unknown] [.]0x00000000e40203bf 1.68% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_store_cb (inlined) 1.60% 0.00% qemu-riscv64 [unknown] [.]0x00000000e403733f 1.13% 0.00% qemu-riscv64 qemu-riscv64 [.] ldub_p(inlined)

     0.73%     0.73%  qemu-riscv64  qemu-riscv64             [.] ste_b

0.53% 0.21% qemu-riscv64 qemu-riscv64 [.]helper_lookup_tb_ptr


1. Cherry pick PATCH 4 to riscv-to-apply.next branch (commit ID: deb0ff0)

# Samples: 378K of event 'cycles:u'
# Event count (approx.): 1381912775966
#
# Children      Self  Command       Shared Object Symbol

# ........ ........ ............ ..............................................................

63.30% 29.62% qemu-riscv64 qemu-riscv64 [.]vext_ldst_us 30.77% 0.04% qemu-riscv64 qemu-riscv64 [.]helper_vle8_v 28.59% 0.02% qemu-riscv64 qemu-riscv64 [.]helper_vse8_v 22.78% 22.78% qemu-riscv64 qemu-riscv64 [.]qemu_plugin_vcpu_mem_cb

    21.40%     5.26%  qemu-riscv64  qemu-riscv64             [.] lde_b

20.69% 10.40% qemu-riscv64 qemu-riscv64 [.]cpu_ldb_mmu 20.06% 3.91% qemu-riscv64 qemu-riscv64 [.]cpu_ldsb_data_ra 19.16% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_data_ra (inlined) 19.16% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_mmuidx_ra (inlined) 12.65% 9.36% qemu-riscv64 qemu-riscv64 [.]cpu_stb_mmu 8.60% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_load_cb (inlined) 6.73% 6.73% qemu-riscv64 qemu-riscv64 [.]do_ld1_mmu.constprop.23 6.73% 0.00% qemu-riscv64 qemu-riscv64 [.]do_ld1_mmu (inlined) 6.31% 0.00% qemu-riscv64 qemu-riscv64 [.]helper_stb_mmu (inlined)

     6.20%     6.20%  qemu-riscv64  qemu-riscv64             [.] do_st1_mmu

6.20% 0.00% qemu-riscv64 qemu-riscv64 [.]adjust_addr (inlined) 5.49% 0.00% qemu-riscv64 [unknown] [.]0x00000000000001fe 3.82% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_store_cb (inlined) 3.01% 0.00% qemu-riscv64 qemu-riscv64 [.] ldub_p(inlined) 2.94% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_mmu_lookup (inlined) 2.94% 0.00% qemu-riscv64 qemu-riscv64 [.]get_alignment_bits (inlined) 2.91% 0.00% qemu-riscv64 qemu-riscv64 [.]clear_helper_retaddr (inlined)

     2.91%     2.91%  qemu-riscv64  qemu-riscv64             [.] ste_b

2.90% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_mmu_lookup (inlined) 2.90% 0.00% qemu-riscv64 qemu-riscv64 [.]get_alignment_bits (inlined) 0.59% 0.24% qemu-riscv64 qemu-riscv64 [.]helper_lookup_tb_ptr

2. Cherry pick PATCH 4+5+6 to riscv-to-apply.next branch (commit ID:deb0ff0)


# Samples: 343K of event 'cycles:u'
# Event count (approx.): 1259748868940
#
# Children      Self  Command       Shared Object Symbol

# ........ ........ ............ ..............................................................

64.16% 35.81% qemu-riscv64 qemu-riscv64 [.]vext_ldst_us 30.96% 0.02% qemu-riscv64 qemu-riscv64 [.]helper_vse8_v 27.44% 0.05% qemu-riscv64 qemu-riscv64 [.]helper_vle8_v 18.37% 18.37% qemu-riscv64 qemu-riscv64 [.]qemu_plugin_vcpu_mem_cb 17.33% 7.45% qemu-riscv64 qemu-riscv64 [.]cpu_ldsb_data_ra

    14.90%     5.02%  qemu-riscv64  qemu-riscv64             [.] lde_b

14.83% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_data_ra (inlined) 14.83% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_ldub_mmuidx_ra (inlined) 14.15% 10.33% qemu-riscv64 qemu-riscv64 [.]cpu_stb_mmu 11.25% 9.62% qemu-riscv64 qemu-riscv64 [.]cpu_ldb_mmu 7.22% 0.00% qemu-riscv64 qemu-riscv64 [.]adjust_addr (inlined) 6.99% 6.99% qemu-riscv64 qemu-riscv64 [.]helper_stb_mmu 6.99% 0.00% qemu-riscv64 qemu-riscv64 [.]do_st1_mmu (inlined) 5.96% 0.00% qemu-riscv64 qemu-riscv64 [.]do_ld1_mmu (inlined) 5.95% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_mmu_lookup (inlined) 5.30% 0.00% qemu-riscv64 [unknown] [.]0x00000000000001fe 4.18% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_store_cb (inlined) 3.22% 0.00% qemu-riscv64 qemu-riscv64 [.]cpu_mmu_lookup (inlined) 3.22% 0.00% qemu-riscv64 qemu-riscv64 [.]get_alignment_bits (inlined)

     3.22%     3.22%  qemu-riscv64  qemu-riscv64             [.] ste_b

3.19% 0.00% qemu-riscv64 qemu-riscv64 [.]get_alignment_bits (inlined) 3.16% 0.00% qemu-riscv64 qemu-riscv64 [.]clear_helper_retaddr (inlined) 2.76% 0.00% qemu-riscv64 qemu-riscv64 [.] g2h(inlined) 2.76% 0.00% qemu-riscv64 qemu-riscv64 [.]g2h_untagged (inlined) 2.30% 0.00% qemu-riscv64 qemu-riscv64 [.]env_cpu (inlined) 2.21% 0.00% qemu-riscv64 qemu-riscv64 [.]plugin_load_cb (inlined) 0.99% 0.00% qemu-riscv64 qemu-riscv64 [.]env_cpu (inlined)

I agree that these functions are large functions to inline and I didn'ttest these patches on all combinations (different guest architecture +different host architecture).So I think that we can drop these three patches until we can make surethat these patches can get benefit on all combinations without side effect.I'll focus on avoiding over-use of the full out-of-line load/storeroutines for the next version.



Thanks for the suggestion and question,

Max

[1]https://inbox.sourceware.org/libc-alpha/20230504074851.38763-1-hau....@sifive.com


On 2024/2/16 4:10 AM, Richard Henderson wrote:

On 2/15/24 09:28, Max Chou wrote:

Signed-off-by: Max Chou <max.c...@sifive.com>
---
  accel/tcg/user-exec.c | 7 +++++--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 68b252cb8e8..c5453810eee 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c

@@ -942,8 +942,11 @@ void page_reset_target_data(target_ulong start,target_ulong last) { }

    /* The system-mode versions of these helpers are in cputlb.c.  */
  -static void *cpu_mmu_lookup(CPUState *cpu, vaddr addr,

- MemOp mop, uintptr_t ra, MMUAccessTypetype)

+static inline QEMU_ALWAYS_INLINE void *cpu_mmu_lookup(CPUState *cpu,
+                                                      vaddr addr,
+                                                      MemOp mop,
+                                                      uintptr_t ra,
+ MMUAccessType type)
  {
      int a_bits = get_alignment_bits(mop);
      void *ret;


This is a large function.  Why does it need to be inlined?

For this and the next two patches I require evidence, because I don'tbelieve you are attacking the problem correctly.

r~

Re: [RFC PATCH 4/6] accel/tcg: Inline cpu_mmu_lookup function

Reply via email to