https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 18 Jan 2018, aldyh at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 > > Aldy Hernandez <aldyh at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org, > | |rguenth at gcc dot gnu.org > > --- Comment #9 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- > Original regression in 7.x started with the -fcode-hoisting pass in r238242. > Things started improving with r254948, though that is probably unrelated. > > Perhaps Richard can comment. code-hoisting does its Job - it reduces the number of stmts in the program. Together with PRE code-hoisting enables more PRE and thus causes extra PHIs (not sure if those are the problem). But if you look at code-hoisting in isolation (-fcode-hoisting -fno-tree-pre) then it should be always profitable - it's probably the extra PRE that does the harm here. Nubers on my machine: > ./xgcc -B. t.c -O2 > /usr//bin/time ./a.out 4.13user 0.00system 0:04.13elapsed 99%CPU (0avgtext+0avgdata 1040maxresident)k 0inputs+0outputs (0major+61minor)pagefaults 0swaps > /usr//bin/time ./a.out 4.06user 0.00system 0:04.06elapsed 100%CPU (0avgtext+0avgdata 1032maxresident)k 0inputs+0outputs (0major+60minor)pagefaults 0swaps > ./xgcc -B. t.c -O2 -fno-tree-pre -fcode-hoisting > /usr//bin/time ./a.out 3.87user 0.00system 0:03.87elapsed 99%CPU (0avgtext+0avgdata 1052maxresident)k 0inputs+0outputs (0major+61minor)pagefaults 0swaps > /usr//bin/time ./a.out 3.90user 0.00system 0:03.90elapsed 99%CPU (0avgtext+0avgdata 1060maxresident)k 0inputs+0outputs (0major+62minor)pagefaults 0swaps > ./xgcc -B. t.c -O2 -ftree-pre -fno-code-hoisting > /usr//bin/time ./a.out 3.85user 0.00system 0:03.85elapsed 100%CPU (0avgtext+0avgdata 1032maxresident)k 0inputs+0outputs (0major+60minor)pagefaults 0swaps > /usr//bin/time ./a.out 3.85user 0.01system 0:03.87elapsed 99%CPU (0avgtext+0avgdata 1060maxresident)k 0inputs+0outputs (0major+62minor)pagefaults 0swaps note that both PRE and code-hoisting are sources of increased register pressure. > ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting -S > grep rsp t.s | wc -l 47 > ./xgcc -B. t.c -O2 -ftree-pre -fno-code-hoisting -S > grep rsp t.s | wc -l 11 > ./xgcc -B. t.c -O2 -fno-tree-pre -fcode-hoisting -S > grep rsp t.s | wc -l 11 taming PRE down by decoupling code hoisting and PRE results in > ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting -S > grep rsp t.s | wc -l 11 > ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting > /usr//bin/time ./a.out 3.90user 0.00system 0:03.90elapsed 100%CPU (0avgtext+0avgdata 1148maxresident)k 0inputs+0outputs (0major+63minor)pagefaults 0swaps > /usr//bin/time ./a.out 3.89user 0.00system 0:03.89elapsed 100%CPU (0avgtext+0avgdata 1128maxresident)k 0inputs+0outputs (0major+60minor)pagefaults 0swaps Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c (revision 256837) +++ gcc/tree-ssa-pre.c (working copy) @@ -3687,15 +3687,23 @@ insert (void) if (dump_file && dump_flags & TDF_DETAILS) fprintf (dump_file, "Starting insert iteration %d\n", num_iterations); new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), flag_tree_pre, - flag_code_hoisting); + false); /* Clear the NEW sets before the next iteration. We have already fully propagated its contents. */ - if (new_stuff) + if (new_stuff || flag_code_hoisting) FOR_ALL_BB_FN (bb, cfun) bitmap_set_free (NEW_SETS (bb)); } statistics_histogram_event (cfun, "insert iterations", num_iterations); + + if (flag_code_hoisting) + { + if (dump_file && dump_flags & TDF_DETAILS) + fprintf (dump_file, "Starting insert for code hoisting\n"); + new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), false, + flag_code_hoisting); + } } but AFAIU this patch shouldn't have any effect... I guess I have to think about this 2nd order effect again (might be a missed PRE in the first place which of course wouldn't help us ;)). The above FAILs for example FAIL: gcc.dg/tree-ssa/ssa-hoist-3.c scan-tree-dump pre "Insertions: 1" FAIL: gcc.dg/tree-ssa/ssa-pre-30.c scan-tree-dump-times pre "Replaced MEM" 2
