https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117467
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > - 32.11% (anonymous namespace)::pass_ext_dce::execute(function*) > ▒ > - ext_dce_execute() > ▒ > - 32.10% df_worklist_dataflow(dataflow*, bitmap_head*, int*, > int) ▒ > - 32.08% ext_dce_rd_transfer_n(int) > ▒ > + 14.75% ext_dce_process_uses(rtx_insn*, rtx_def*, > bitmap_head*, bool) ▒ > + 8.18% bitmap_ior_into(bitmap_head*, bitmap_head const*) > ▒ > + 4.49% ext_dce_process_sets(rtx_insn*, rtx_def*, > bitmap_head*) ▒ > 3.34% bitmap_copy(bitmap_head*, bitmap_head const*) > ▒ > 1.31% bitmap_equal_p(bitmap_head const*, bitmap_head > const*) > > likely (unverified) also the source of 25GB memory use. > > The DF problem seems seriously unoptimized - it lacks a separate "local" > compute > step (the ext_dce_process_sets part that populates live_tmp _per insn_!). That is, usually the transfer function is the IOR of input and appropriate IOR/whatever of the (cached!) local compute result.