https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111241
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Samples: 121K of event 'cycles:u', Event count (approx.): 159788164341 Overhead Samples Command Shared Object Symbol 21.24% 25791 cc1 cc1 [.] get_ref_base_and_exten# 7.75% 9154 cc1 libc-2.31.so [.] __memset_avx512_erms # 5.61% 7072 cc1 cc1 [.] dominated_by_p # 2.43% 2936 cc1 cc1 [.] bitmap_set_bit # 2.38% 3000 cc1 cc1 [.] dominated_by_p_w_unex # 1.76% 2154 cc1 cc1 [.] find_base_term # 1.75% 2148 cc1 cc1 [.] ix86_find_base_term # 1.41% 1656 cc1 cc1 [.] df_reorganize_refs_by_# all usual suspects are present ... :/ The memset and df_reorganize_refs_by_defs are the known bug that RTL invariant motion does work O(function-size) * O(number-of-loops) through df_analyze_loop because reorganize-refs processes all function refs, not only loop refs (difficult to fix). For get_ref_base_and_extent we have ~2 array-refs per call and array-ref processing is expensive (array_ref_element_size, but also wi::lshift_large). The most expensive calls are from vn_reference_lookup done during elimination looking for redundant stores (that's odd), possibly because it enables VN_WALKREWRITE unconditionally, for -O2 that's also the default though.