On Wed, Jan 25, 2017 at 7:11 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > this patch modifies profitable_jump_thread_path heuristics by enabling > code expansion when the threaded path contains at least one hot path. > The basic idea is that while we do not decrease instruction count on the > non-duplicated path, we reduce number of entry edges and by this path > separation we possibly enable later optimization. > > We may try to get more careful about when optimization is enabled but it is > hard to do and I don't think the number of cold paths that meet hot paths > is large enough to make this matter too much. > > Bootstrapped/regtested x86_64-linux, OK? > * tree-ssa-threadbackward.c (profitable_jump_thread_path): Adjust > hot/cold heuristics. > > * gcc.dg/tree-ssa/pr77445-2.c: Update testcase. > Index: tree-ssa-threadbackward.c > =================================================================== > --- tree-ssa-threadbackward.c (revision 244732) > +++ tree-ssa-threadbackward.c (working copy) > @@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b > bool threaded_through_latch = false; > bool multiway_branch_in_path = false; > bool threaded_multiway_branch = false; > + bool contains_hot_bb = false; > > /* Count the number of instructions on the path: as these instructions > will have to be duplicated, we will not record the path if there > @@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b > { > basic_block bb = (*path)[j]; > > + if (!contains_hot_bb && speed_p) > + contains_hot_bb |= optimize_bb_for_speed_p (bb); > +
Hmm, but you are also counting the destination of the threading here which we will not duplicate. Shouldn't this be under the if (j < path_length - 1) conditional so we look for hot BBs we are actually duplicating only (similar restrictions apply to path[0]?). RIchard. > /* Remember, blocks in the path are stored in opposite order > in the PATH array. The last entry in the array represents > the block with an outgoing edge that we will redirect to the > @@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b > return NULL; > } > > - if (speed_p && optimize_edge_for_speed_p (taken_edge)) > + /* Threading is profitable if the path duplicated is hot but also > + in a case we separate cold path from hot path and permit optimization > + of the hot path later. Be on the agressive side here. In some > testcases, > + as in PR 78407 this leads to noticeable improvements. */ > + if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb)) > { > if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) > { > Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/pr77445-2.c (revision 244746) > +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c (working copy) > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */ > +/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats > -fdump-tree-thread2-details-blocks-stats > -fdump-tree-thread3-details-blocks-stats > -fdump-tree-thread4-details-blocks-stats" } */ > typedef enum STATES { > START=0, > INVALID, > @@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti > increase much. */ > /* { dg-final { scan-tree-dump "Jumps threaded: 1[1-9]" "thread1" } } */ > /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */ > +/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */ > +/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */ > +/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */ > +/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */ > Index: testsuite/gcc.dg/tree-ssa/threadbackward-1.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/threadbackward-1.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/threadbackward-1.c (working copy) > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-ethread" } */ > +char *c; > +int t() > +{ > + for (int i=0;i<5000;i++) > + c[i]=i; > +} > +/* { dg-final { scan-tree-dump-times "Registering FSM jump thread" 1 > "ethread"} } */