https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119482
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> --- r15-9175 on x86_64 now shows (with release checking, built with GCC 7, not bootstrapped): Samples: 112K of event 'cycles:Pu', Event count (approx.): 139897585915 Overhead Samples Command Shared Object Symbol 6.42% 7028 cc1plus cc1plus [.] bitmap_and_into(bitma 6.19% 6851 cc1plus cc1plus [.] bitmap_list_insert_el 3.96% 4408 cc1plus cc1plus [.] bitmap_ior_into(bitma 3.28% 3695 cc1plus cc1plus [.] bitmap_set_bit(bitmap 2.75% 3099 cc1plus cc1plus [.] get_ref_base_and_exte 2.31% 2532 cc1plus cc1plus [.] bitmap_and(bitmap_hea 1.85% 2052 cc1plus cc1plus [.] bitmap_ior_and_compl( 1.83% 2027 cc1plus cc1plus [.] bitmap_copy(bitmap_he 1.32% 1482 cc1plus cc1plus [.] bitmap_bit_p(bitmap_h 1.24% 1439 cc1plus cc1plus [.] bitmap_set_range(bitm 1.18% 1329 cc1plus cc1plus [.] lra_create_live_range I'll note the -ftime-report is quite flat, DF is sticking out (every RTL pass hits on this) as well as LRA (for the same reason, plus lra_create_live_ranges). The bitmap_list_insert_element_after hit is actually mispredictions on bitmap_element_allocate wrt the en-block queued free elements from bitmap_elt_clear_from: element = bit_obstack->elements; if (element) /* Use up the inner list first before looking at the next element of the outer list. */ if (element->next) { bit_obstack->elements = element->next; possibly having two free element lists, one for singletons and one for lists would be friendlier, thus if (bit_obstack->elements) ... else if (bit_obstack->multi_elements) ...