https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
--- Comment #150 from Richard Biener <rguenth at gcc dot gnu.org> --- For _num.i at -O2+ it's PRE / postreload GCSE via compute_transp that takes all compile-time. The reason is all the sbitmaps used and using them "inverted" aka one bitmap per BB instead of one bitmap per expr. Sorting the expressions after bitmap index before processing doesn't help here. Samples: 116K of event 'cycles', Event count (approx.): 132865298352 Overhead Samples Command Shared Object Symbol 7.64% 8910 cc1 cc1 [.] find_base_term # 6.45% 7521 cc1 cc1 [.] get_ref_base_and_extent # 6.35% 7406 cc1 cc1 [.] compute_transp # 2.85% 3308 cc1 cc1 [.] bitmap_bit_p # 2.84% 3314 cc1 cc1 [.] rtx_equal_for_memref_p # 2.68% 3124 cc1 cc1 [.] find_base_term it's also mostly alias analysis cost, so maybe the bitmaps are not the actual problem but that we compute transparency for each block and each expression even for blocks that will in the end not require it because the expr isn't antic through it.