Info about the slowest avocado tcg tests (arm, s390x, ppc, replay)

Alex Bennée Fri, 12 Aug 2022 23:16:56 -0700

[RESEND because the ML ate my email again]

Hi All,


In my investigations into our current Avocado tests I identified all
that ran for more than 30s for an --enable-debug build. They are:

  tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2: PASS( 
763.81 s)
  tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3: PASS( 
742.75 s)

The main slowness here is due to inefficient TB invalidation of code
generated during the firmware phase when the rest of the system boots.

  tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:  PASS( 
433.48 s)

This looks like a very fat distro. A lot of slow delays waiting for
systemd units to come up. That said the positively svelte Alpine ISO
still takes about 4min30s to churn through it's userspace boot.

Looking at the callchains though it might be helped by the same fixes as
the ARM tcg GIC tests:

-   28.15%     0.16%  qemu-system-s39  qemu-system-s390x             [.] 
store_helper
   - 27.99% store_helper
      - 23.08% notdirty_write
         - 17.51% page_collection_lock
              2.76% page_trylock_add
              1.22% g_tree_lookup
              0.62% g_tree_find_node
         - 4.53% tb_invalidate_phys_page_fast
            - 4.37% tb_invalidate_phys_page_range__locked
               - 3.44% tb_phys_invalidate__locked
                  - 3.44% do_tb_phys_invalidate
                       3.40% tb_page_remove
           0.63% page_trylock_add
      - 4.25% tlb_fill
         - 4.21% s390_cpu_tlb_fill
            - 2.35% mmu_translate
               - 1.77% mmu_translate_asce
                  - 1.58% read_table_entry
                     - 1.39% address_space_read_full
                        - 1.11% flatview_read
                           - 0.62% flatview_translate
                                0.55% flatview_do_translate
            - 1.70% tlb_set_page
               - 1.64% tlb_set_page_with_attrs
                    0.52% cpu_physical_memory_is_clean
-   25.10%     0.02%  qemu-system-s39  qemu-system-s390x             [.] 
notdirty_write
   - 25.08% notdirty_write
      - 18.14% page_collection_lock
           2.84% page_trylock_add
           1.25% g_tree_lookup
           0.64% g_tree_find_node
      - 5.86% tb_invalidate_phys_page_fast
         - 5.69% tb_invalidate_phys_page_range__locked
            - 4.62% tb_phys_invalidate__locked
               - 4.62% do_tb_phys_invalidate
                    4.57% tb_page_remove
        0.65% page_trylock_add

There might be some millage looking at the tb_jmp_cache utilisation
though because it looks like pretty much all tb_lookup falls through to
the full QHT lookup:

-   19.82%     1.65%  qemu-system-s39  qemu-system-s390x             [.] 
tb_lookup
   - 18.17% tb_lookup
      - 17.84% tb_htable_lookup
         - 9.94% get_page_addr_code
            - 9.66% get_page_addr_code_hostp
               - 5.55% tlb_fill
                  - 5.48% s390_cpu_tlb_fill
                     - 3.55% mmu_translate
                        - 2.73% mmu_translate_asce
                           - 2.44% read_table_entry
                              - 2.17% address_space_read_full
                                 - 1.72% flatview_read
                                    - 0.93% flatview_translate
                                       - 0.83% flatview_do_translate
                                            0.58% 
address_space_translate_internal
                        - 0.54% mmu_absolute_addr_valid
                             0.51% address_space_access_valid
                     - 1.73% tlb_set_page
                        - 1.64% tlb_set_page_with_attrs
                             0.54% tlb_flush_vtlb_page_locked
               - 1.56% qemu_ram_addr_from_host_nofail
                  - 1.40% qemu_ram_addr_from_host
                       1.17% qemu_ram_block_from_host
                 0.73% tlb_entry
         - 4.30% qht_lookup_custom
            - 1.90% qht_do_lookup
                 1.21% tb_lookup_cmp
              1.59% seqlock_read_begin
         - 2.82% tb_hash_func
            - 2.66% qemu_xxhash6
               - 2.35% qemu_xxhash7
                    rol32

  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg: PASS( 341.90 s)
  tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg: PASS( 240.76 s)
  tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS( 221.70 s)

Nothing especially jumps out of perf (in fact it looks very similar to
s390x):

   6.40%  qemu-system-ppc  qemu-system-ppc64        [.] tb_lookup
   5.63%  qemu-system-ppc  qemu-system-ppc64        [.] check_regs
   4.87%  qemu-system-ppc  qemu-system-ppc64        [.] 
hreg_compute_hflags_value
   4.21%  qemu-system-ppc  qemu-system-ppc64        [.] pmu_increment_insns
   2.42%  qemu-system-ppc  qemu-system-ppc64        [.] helper_insns_inc
   2.26%  qemu-system-ppc  qemu-system-ppc64        [.] cpu_get_tb_cpu_state
   2.20%  qemu-system-ppc  qemu-system-ppc64        [.] type_get_parent
   2.07%  qemu-system-ppc  qemu-system-ppc64        [.] helper_lookup_tb_ptr
   1.96%  qemu-system-ppc  [kernel.kallsyms]        [k] list_lru_count_one
   1.63%  qemu-system-ppc  qemu-system-ppc64        [.] type_is_ancestor
   1.62%  qemu-system-ppc  qemu-system-ppc64        [.] helper_vsldoi
   1.49%  qemu-system-ppc  qemu-system-ppc64        [.] la_reset_pref
   1.14%  qemu-system-ppc  qemu-system-ppc64        [.] seqlock_read_begin
   1.02%  qemu-system-ppc  qemu-system-ppc64        [.] liveness_pass_1
   0.97%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock
   0.81%  qemu-system-ppc  qemu-system-ppc64        [.] curr_cflags
   0.76%  qemu-system-ppc  libglib-2.0.so.0.5600.4  [.] g_hash_table_lookup
   0.75%  qemu-system-ppc  qemu-system-ppc64        [.] check_for_breakpoints
   0.75%  qemu-system-ppc  qemu-system-ppc64        [.] tb_jmp_cache_hash_func  
                     

which all looks fairly typical for --debug code with heavy computed
jumps (triggering a lot of tb_lookups).

  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: WARN( 91.80 s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: WARN( 91.47 
s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: WARN( 91.36 s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: WARN( 
88.56 s)

These all end with WARN so could be broken anyway and just timing out.
Didn't dig in too deeply.

  tests/avocado/replay_kernel.py:ReplayKernelNormal.test_arm_cubieboard_initrd: 
PASS( 63.24 s)
  tests/avocado/replay_kernel.py:ReplayKernelNormal.test_sparc_ss20: PASS( 
59.36 s)

This could just be replay being inherently slow given it runs several
times under icount.

  tests/avocado/boot_xen.py:BootXen.test_arm64_xen_411_and_dom0: PASS( 57.15 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd: 
PASS( 55.00 s)
  
tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2500_romulus_openbmc_v2_9_0:
 PASS( 52.73 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_x86_64_pc: PASS( 
51.08 s)
  
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_aarch64_xlnx_versal_virt:
 PASS( 47.98 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_ast2600_debian: 
PASS( 47.48 s)
  
tests/avocado/linux_initrd.py:LinuxInitrd.test_with_2gib_file_should_work_with_linux_v4_16:
 PASS( 45.29 s)
  
tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2400_palmetto_openbmc_v2_9_0:
 PASS( 44.54 s)
  
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd: 
PASS( 40.96 s)
  
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_exynos4210_initrd:
 PASS( 37.70 s)
  
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd:
 PASS( 34.29 s)
  
tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_sata: 
PASS( 33.19 s)

-- 
Alex Bennée

Info about the slowest avocado tcg tests (arm, s390x, ppc, replay)

Reply via email to