https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> --- nOn Sat, 27 Sep 2014, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545 > > Jan Hubicka <hubicka at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |ASSIGNED > CC| |mliska at suse dot cz > Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot > gnu.org > > --- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> --- > I have moved tracer before the late cleanups that seems to be rather obbious > thing to do. This lets us to optimize the testcase (with -O2): > int main() () > { > struct A * ap; > int i; > int _6; > > <bb 2>: > > <bb 3>: > # i_29 = PHI <i_22(6), 0(2)> > _6 = i_29 % 7; > if (_6 == 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > ap_8 = operator new (16); > ap_8->i = 0; > ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B]; > goto <bb 6>; > > <bb 5>: > ap_13 = operator new (16); > MEM[(struct B *)ap_13].D.2244.i = 0; > MEM[(struct B *)ap_13].b = 0; > MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B]; > > <bb 6>: > # ap_4 = PHI <ap_13(5), ap_8(4)> > operator delete (ap_4); > i_22 = i_29 + 1; > if (i_22 != 10000) > goto <bb 3>; > else > goto <bb 7>; > > <bb 7>: > return 0; > > } > > Martin, I do not have SPEC setup, do you think you can benchmark the attached > patch with SPEC and profile feedback and also non-FDO -O3 -ftracer compared to > -O3, please? > It would be nice to know code size impact, too. > Index: passes.def > =================================================================== > --- passes.def (revision 215651) > +++ passes.def (working copy) > @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_call_cdce); > NEXT_PASS (pass_cselim); > + NEXT_PASS (pass_tracer); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_tree_ifcombine); > NEXT_PASS (pass_phiopt); > @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_cse_reciprocals); > NEXT_PASS (pass_reassoc); > NEXT_PASS (pass_strength_reduction); > - NEXT_PASS (pass_tracer); > NEXT_PASS (pass_dominator); > NEXT_PASS (pass_strlen); > NEXT_PASS (pass_vrp); > > Doing it at same approximately the same place as loop header copying seems to > make most sense to me. It benefits from early cleanups and DCE definitly and > it should enable more fun with the later scalar passes that are almost all > rerun then. We need to make sure tracer doesn't mess too much with loops then. Btw, "useless" tracing may be undone again by tail-merging. Tracer seems to consume only profile information and thus doesn't rely on any other transforms (well, apart from cleanups which could affect its cost function). Why not schedule it even earlier? Like to before pass_build_alias? (the pipeline up to loop transforms is quite a mess...)