http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44563
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> 2010-12-13 01:07:33 UTC --- My profile was at -O2. Concerning Jakub's callgrind, the -O0 compilation finishes in about 44s for me. Profile is: 4349 3.8607 libc-2.11.1.so libc-2.11.1.so _int_malloc 3150 2.7963 cc1 cc1 record_reg_classes.constprop.9 2881 2.5575 cc1 cc1 htab_find_slot_with_hash 2104 1.8678 cc1 cc1 ggc_set_mark 2039 1.8101 libc-2.11.1.so libc-2.11.1.so msort_with_tmp 2005 1.7799 cc1 cc1 bitmap_set_bit 1836 1.6299 cc1 cc1 df_ref_create_structure 1775 1.5757 cc1 cc1 find_reloads 1738 1.5429 cc1 cc1 ggc_internal_alloc_stat 1538 1.3653 libc-2.11.1.so libc-2.11.1.so memset 1430 1.2694 cc1 cc1 eq_node 1375 1.2206 cc1 cc1 preprocess_constraints 1317 1.1691 libc-2.11.1.so libc-2.11.1.so _int_free 1309 1.1620 cc1 cc1 df_insn_refs_collect 1289 1.1443 cc1 cc1 ix86_function_arg_regno_p 1277 1.1336 cc1 cc1 df_ref_record 1249 1.1088 cc1 cc1 ix86_save_reg 1215 1.0786 cc1 cc1 ix86_compute_frame_layout 1171 1.0395 libc-2.11.1.so libc-2.11.1.so malloc_consolidate 1134 1.0067 cc1 cc1 extract_insn So I don't get that much of RA by itself. Tracking that malloc ineffeciency might be low hanging fruit.