[RESEND because the ML ate my email again] Hi All,
In my investigations into our current Avocado tests I identified all that ran for more than 30s for an --enable-debug build. They are: tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2: PASS( 763.81 s) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3: PASS( 742.75 s) The main slowness here is due to inefficient TB invalidation of code generated during the firmware phase when the rest of the system boots. tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: PASS( 433.48 s) This looks like a very fat distro. A lot of slow delays waiting for systemd units to come up. That said the positively svelte Alpine ISO still takes about 4min30s to churn through it's userspace boot. Looking at the callchains though it might be helped by the same fixes as the ARM tcg GIC tests: - 28.15% 0.16% qemu-system-s39 qemu-system-s390x [.] store_helper - 27.99% store_helper - 23.08% notdirty_write - 17.51% page_collection_lock 2.76% page_trylock_add 1.22% g_tree_lookup 0.62% g_tree_find_node - 4.53% tb_invalidate_phys_page_fast - 4.37% tb_invalidate_phys_page_range__locked - 3.44% tb_phys_invalidate__locked - 3.44% do_tb_phys_invalidate 3.40% tb_page_remove 0.63% page_trylock_add - 4.25% tlb_fill - 4.21% s390_cpu_tlb_fill - 2.35% mmu_translate - 1.77% mmu_translate_asce - 1.58% read_table_entry - 1.39% address_space_read_full - 1.11% flatview_read - 0.62% flatview_translate 0.55% flatview_do_translate - 1.70% tlb_set_page - 1.64% tlb_set_page_with_attrs 0.52% cpu_physical_memory_is_clean - 25.10% 0.02% qemu-system-s39 qemu-system-s390x [.] notdirty_write - 25.08% notdirty_write - 18.14% page_collection_lock 2.84% page_trylock_add 1.25% g_tree_lookup 0.64% g_tree_find_node - 5.86% tb_invalidate_phys_page_fast - 5.69% tb_invalidate_phys_page_range__locked - 4.62% tb_phys_invalidate__locked - 4.62% do_tb_phys_invalidate 4.57% tb_page_remove 0.65% page_trylock_add There might be some millage looking at the tb_jmp_cache utilisation though because it looks like pretty much all tb_lookup falls through to the full QHT lookup: - 19.82% 1.65% qemu-system-s39 qemu-system-s390x [.] tb_lookup - 18.17% tb_lookup - 17.84% tb_htable_lookup - 9.94% get_page_addr_code - 9.66% get_page_addr_code_hostp - 5.55% tlb_fill - 5.48% s390_cpu_tlb_fill - 3.55% mmu_translate - 2.73% mmu_translate_asce - 2.44% read_table_entry - 2.17% address_space_read_full - 1.72% flatview_read - 0.93% flatview_translate - 0.83% flatview_do_translate 0.58% address_space_translate_internal - 0.54% mmu_absolute_addr_valid 0.51% address_space_access_valid - 1.73% tlb_set_page - 1.64% tlb_set_page_with_attrs 0.54% tlb_flush_vtlb_page_locked - 1.56% qemu_ram_addr_from_host_nofail - 1.40% qemu_ram_addr_from_host 1.17% qemu_ram_block_from_host 0.73% tlb_entry - 4.30% qht_lookup_custom - 1.90% qht_do_lookup 1.21% tb_lookup_cmp 1.59% seqlock_read_begin - 2.82% tb_hash_func - 2.66% qemu_xxhash6 - 2.35% qemu_xxhash7 rol32 tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg: PASS( 341.90 s) tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg: PASS( 240.76 s) tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS( 221.70 s) Nothing especially jumps out of perf (in fact it looks very similar to s390x): 6.40% qemu-system-ppc qemu-system-ppc64 [.] tb_lookup 5.63% qemu-system-ppc qemu-system-ppc64 [.] check_regs 4.87% qemu-system-ppc qemu-system-ppc64 [.] hreg_compute_hflags_value 4.21% qemu-system-ppc qemu-system-ppc64 [.] pmu_increment_insns 2.42% qemu-system-ppc qemu-system-ppc64 [.] helper_insns_inc 2.26% qemu-system-ppc qemu-system-ppc64 [.] cpu_get_tb_cpu_state 2.20% qemu-system-ppc qemu-system-ppc64 [.] type_get_parent 2.07% qemu-system-ppc qemu-system-ppc64 [.] helper_lookup_tb_ptr 1.96% qemu-system-ppc [kernel.kallsyms] [k] list_lru_count_one 1.63% qemu-system-ppc qemu-system-ppc64 [.] type_is_ancestor 1.62% qemu-system-ppc qemu-system-ppc64 [.] helper_vsldoi 1.49% qemu-system-ppc qemu-system-ppc64 [.] la_reset_pref 1.14% qemu-system-ppc qemu-system-ppc64 [.] seqlock_read_begin 1.02% qemu-system-ppc qemu-system-ppc64 [.] liveness_pass_1 0.97% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock 0.81% qemu-system-ppc qemu-system-ppc64 [.] curr_cflags 0.76% qemu-system-ppc libglib-2.0.so.0.5600.4 [.] g_hash_table_lookup 0.75% qemu-system-ppc qemu-system-ppc64 [.] check_for_breakpoints 0.75% qemu-system-ppc qemu-system-ppc64 [.] tb_jmp_cache_hash_func which all looks fairly typical for --debug code with heavy computed jumps (triggering a lot of tb_lookups). tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: WARN( 91.80 s) tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: WARN( 91.47 s) tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: WARN( 91.36 s) tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: WARN( 88.56 s) These all end with WARN so could be broken anyway and just timing out. Didn't dig in too deeply. tests/avocado/replay_kernel.py:ReplayKernelNormal.test_arm_cubieboard_initrd: PASS( 63.24 s) tests/avocado/replay_kernel.py:ReplayKernelNormal.test_sparc_ss20: PASS( 59.36 s) This could just be replay being inherently slow given it runs several times under icount. tests/avocado/boot_xen.py:BootXen.test_arm64_xen_411_and_dom0: PASS( 57.15 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd: PASS( 55.00 s) tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2500_romulus_openbmc_v2_9_0: PASS( 52.73 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_x86_64_pc: PASS( 51.08 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_aarch64_xlnx_versal_virt: PASS( 47.98 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_ast2600_debian: PASS( 47.48 s) tests/avocado/linux_initrd.py:LinuxInitrd.test_with_2gib_file_should_work_with_linux_v4_16: PASS( 45.29 s) tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2400_palmetto_openbmc_v2_9_0: PASS( 44.54 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd: PASS( 40.96 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_exynos4210_initrd: PASS( 37.70 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd: PASS( 34.29 s) tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_sata: PASS( 33.19 s) -- Alex Bennée