https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
--- Comment #5 from Markus Trippelsdorf <trippels at gcc dot gnu.org> --- Perf shows: Overhead Command Shared Object Symbol 2.45% cc1plus libc-2.21.90.so [.] _int_malloc 1.88% cc1plus cc1plus [.] bitmap_find_bit 1.72% cc1plus cc1plus [.] gt_ggc_mx_lang_tree_node 1.36% cc1plus libc-2.21.90.so [.] _int_free 1.05% cc1plus cc1plus [.] ggc_set_mark 0.97% cc1plus cc1plus [.] record_reg_classes 0.96% cc1plus cc1plus [.] df_worklist_dataflow 0.91% cc1plus cc1plus [.] build_qualified_type 0.88% cc1plus cc1plus [.] df_note_compute 0.84% cc1plus libc-2.21.90.so [.] malloc_consolidate (Using a faster malloc implementation speeds up compile time by ~5%: markus@x4 ~ % time g++ -w -Ofast tramp3d-v4.cpp g++ -w -Ofast tramp3d-v4.cpp 26.00s user 0.32s system 99% cpu 26.341 total markus@x4 ~ % time LD_PRELOAD=/usr/lib/libllalloc.so.1.3 g++ -w -Ofast tramp3d-v4.cpp LD_PRELOAD=/usr/lib/libllalloc.so.1.3 g++ -w -Ofast tramp3d-v4.cpp 24.60s user 0.37s system 99% cpu 24.997 total)