Re: Early jump threading

Richard Biener Thu, 11 Aug 2016 07:06:54 -0700

On Thu, 11 Aug 2016, Jan Hubicka wrote:

> Hi,
> this patch adds early jump threading pass.  Jump threading is one of most
> common cases where estimated profile becomes corrupted, because the branches
> are predicted independently beforehand. This patch simply adds early mode to
> jump threader that does not permit code growth and thus only win/win threads
> are done before profile is constructed.
> 
> Naturally this also improves IPA decisions because code size/time is estimated
> more acurately.
> 
> It is not very cool to add another pass and the jump threader is already
> run 5 times. I think incrementally we could drop one of late threaders at 
> least.
> I tried to measure compile time effects but they are in wash. Moreover the 
> patch
> pays for itself in cc1plus code size:
> 
> Before patch to tweak size estimates: 27779964  
> Current mainline:                     27748900  
> With patch applied:                   27716173  
> 
> So despite adding few functions the code size effect is positive which I think
> is quite nice.
> 
> Given the fact that jump threading seems quite lightweit, I wonder why it is
> guarded by flag_expensive_optimizations? Is it expensive in terms of code
> size?
> 
> The effectivity of individual threading passes is as follows (for tramp3d)
> 
>       mainline                              with patch
> pass  thread count     profile mismatches   thread count    profile mismatch
> early                                       525
> 1     1853             1900                 316             337
> 2     4                812                  4               288
> 3     24               1450                 32              947
> 4     76               1457                 75              975
> 
> So at least tramp3d data seems to suggest that we can drop the second 
> occurence
> of jump threading and that most of the job thread1 does can be done by the
> size restricted early version (the lower absolute counts are caused by the
> fact that threadable paths gets duplicated by the inliner). 50% drop in
> profile distortion is not bad. I wonder why basically every threaded paths 
> seems
> to introduce a mismatch.
> 
> The patch distorts testusite somewhat, in most cases we only want to update
> dump files or disable early threading:
> 
> +XPASS: gcc.dg/uninit-15.c  (test for warnings, line 13)
> +XPASS: gcc.dg/uninit-15.c  (test for warnings, line 23)
> +FAIL: gcc.dg/uninit-15.c  (test for warnings, line 24)
> +FAIL: gcc.dg/tree-ssa/pr68198.c scan-tree-dump-times thread1 "Registering 
> FSM" 1
> +FAIL: gcc.dg/tree-ssa/pr69196-1.c scan-tree-dump thread1 "FSM did not thread 
> around loop and would copy too many statements"
> +FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2b.c scan-tree-dump-times thread1 
> "Jumps threaded: 1" 1
> +FAIL: gcc.dg/tree-ssa/ssa-thread-13.c scan-tree-dump thread1 "FSM"
> +FAIL: gcc.dg/tree-ssa/vrp01.c scan-tree-dump-times vrp1 "Folding predicate 
> p_.*to 1" 1
> +FAIL: gcc.dg/tree-ssa/vrp56.c scan-tree-dump thread1 "Jumps threaded: 1"
> +FAIL: gcc.dg/tree-ssa/vrp92.c scan-tree-dump vrp1 "res_.: \\\\[1, 1\\\\]"
> 
> This testcase is the now probably unnecesary heuristics (it may still be
> relevant in cases we do not thread because of size bounds but its effectivity
> may not be worth the maintenance cost):
> 
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-1.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 1
> +FAIL: g++.dg/predict-loop-exit-2.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++11  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++14  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "extra loop exit heuristics of edge[^:]*:" 2
> +FAIL: g++.dg/predict-loop-exit-3.C  -std=gnu++98  scan-tree-dump-times 
> profile_estimate "loop exit heuristics of edge[^:]*:" 3
> 
> If the patch seems acceptable, I will do the updates. One option why I did
> not do that is that it seems to be now posisble to pass parameters to passes
> from passes.def, so perhaps we do not need early_thread_jumps, but doing so is
> consistent with way we handle other early passes.


I wonder why you choose to put the FSM threader early which only does
backward threading(?!).  I'd expect forward threading to be more
profitable (though we don't have a separate threader for that and
would need VRP or DOM - but it seems we will get an early VRP anyway).

Richard.

> Bootstrapped/regtested x86_64-linux
> Honza
> 
>       * passes.def (pass_early_thread_jumps): Schedule after forwprop.
>       * tree-pass.h (make_pass_early_thread_jumps): Declare.
>       * tree-ssa-threadbackward.c (fsm_find_thread_path,
>       fsm_find_thread_path, profitable_jump_thread_path,
>       fsm_find_control_statement_thread_paths,
>       find_jump_threads_backwards): Add speed_p parameter.
>       (pass_data_early_thread_jumps): New pass.
>       (make_pass_early_thread_jumps): New function.
>       
> Index: passes.def
> ===================================================================
> --- passes.def        (revision 239218)
> +++ passes.def        (working copy)
> @@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.
>         /* After CCP we rewrite no longer addressed locals into SSA
>            form if possible.  */
>         NEXT_PASS (pass_forwprop);
> +          NEXT_PASS (pass_early_thread_jumps);
>         NEXT_PASS (pass_sra_early);
>         /* pass_build_ealias is a dummy pass that ensures that we
>            execute TODO_rebuild_alias at this point.  */
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h       (revision 239218)
> +++ tree-pass.h       (working copy)
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_cd_dce
>  extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_early_thread_jumps (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
> Index: tree-ssa-threadbackward.c
> ===================================================================
> --- tree-ssa-threadbackward.c (revision 239219)
> +++ tree-ssa-threadbackward.c (working copy)
> @@ -61,12 +61,14 @@ get_gimple_control_stmt (basic_block bb)
>  /* Return true if the CFG contains at least one path from START_BB to END_BB.
>     When a path is found, record in PATH the blocks from END_BB to START_BB.
>     VISITED_BBS is used to make sure we don't fall into an infinite loop.  
> Bound
> -   the recursion to basic blocks belonging to LOOP.  */
> +   the recursion to basic blocks belonging to LOOP.
> +   SPEED_P indicate that we could increase code size to improve the code 
> path */
>  
>  static bool
>  fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
>                     vec<basic_block, va_gc> *&path,
> -                   hash_set<basic_block> *visited_bbs, loop_p loop)
> +                   hash_set<basic_block> *visited_bbs, loop_p loop,
> +                   bool speed_p)
>  {
>    if (loop != start_bb->loop_father)
>      return false;
> @@ -82,7 +84,8 @@ fsm_find_thread_path (basic_block start_
>        edge e;
>        edge_iterator ei;
>        FOR_EACH_EDGE (e, ei, start_bb->succs)
> -     if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop))
> +     if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop,
> +                               speed_p))
>         {
>           vec_safe_push (path, start_bb);
>           return true;
> @@ -101,11 +104,13 @@ fsm_find_thread_path (basic_block start_
>     value on PATH.  ARG is the value of that SSA_NAME.
>  
>     BBI will be appended to PATH when we have a profitable jump threading
> -   path.  Callers are responsible for removing BBI from PATH in that case. */
> +   path.  Callers are responsible for removing BBI from PATH in that case.
> +
> +   SPEED_P indicate that we could increase code size to improve the code 
> path */
>  
>  static edge
>  profitable_jump_thread_path (vec<basic_block, va_gc> *&path,
> -                          basic_block bbi, tree name, tree arg)
> +                          basic_block bbi, tree name, tree arg, bool speed_p)
>  {
>    /* Note BBI is not in the path yet, hence the +1 in the test below
>       to make sure BBI is accounted for in the path length test.  */
> @@ -306,7 +311,7 @@ profitable_jump_thread_path (vec<basic_b
>        return NULL;
>      }
>  
> -  if (optimize_edge_for_speed_p (taken_edge))
> +  if (speed_p && optimize_edge_for_speed_p (taken_edge))
>      {
>        if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>       {
> @@ -421,13 +426,15 @@ convert_and_register_jump_thread_path (v
>  
>  /* We trace the value of the SSA_NAME NAME back through any phi nodes looking
>     for places where it gets a constant value and save the path.  Stop after
> -   having recorded MAX_PATHS jump threading paths.  */
> +   having recorded MAX_PATHS jump threading paths.
> +
> +   SPEED_P indicate that we could increase code size to improve the code 
> path */
>  
>  static void
>  fsm_find_control_statement_thread_paths (tree name,
>                                        hash_set<basic_block> *visited_bbs,
>                                        vec<basic_block, va_gc> *&path,
> -                                      bool seen_loop_phi)
> +                                      bool seen_loop_phi, bool speed_p)
>  {
>    /* If NAME appears in an abnormal PHI, then don't try to trace its
>       value back through PHI nodes.  */
> @@ -495,7 +502,7 @@ fsm_find_control_statement_thread_paths
>         hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
>  
>         if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs,
> -                                 e->src->loop_father))
> +                                 e->src->loop_father, speed_p))
>           ++e_count;
>  
>         delete visited_bbs;
> @@ -561,7 +568,7 @@ fsm_find_control_statement_thread_paths
>             /* Recursively follow SSA_NAMEs looking for a constant
>                definition.  */
>             fsm_find_control_statement_thread_paths (arg, visited_bbs, path,
> -                                                    seen_loop_phi);
> +                                                    seen_loop_phi, speed_p);
>  
>             path->pop ();
>             continue;
> @@ -572,7 +579,8 @@ fsm_find_control_statement_thread_paths
>  
>         /* If this is a profitable jump thread path, then convert it
>            into the canonical form and register it.  */
> -       edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg);
> +       edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg,
> +                                                      speed_p);
>         if (taken_edge)
>           {
>             if (bb_loop_depth (taken_edge->src)
> @@ -588,7 +596,7 @@ fsm_find_control_statement_thread_paths
>  
>        if (TREE_CODE (arg) == SSA_NAME)
>       fsm_find_control_statement_thread_paths (arg, visited_bbs,
> -                                              path, seen_loop_phi);
> +                                              path, seen_loop_phi, speed_p);
>  
>        else
>       {
> @@ -598,7 +606,7 @@ fsm_find_control_statement_thread_paths
>         path->pop ();
>  
>         edge taken_edge = profitable_jump_thread_path (path, var_bb,
> -                                                  name, arg);
> +                                                      name, arg, speed_p);
>         if (taken_edge)
>           {
>             if (bb_loop_depth (taken_edge->src)
> @@ -622,10 +630,11 @@ fsm_find_control_statement_thread_paths
>     is a constant.  Record such paths for jump threading.
>  
>     It is assumed that BB ends with a control statement and that by
> -   finding a path where NAME is a constant, we can thread the path.  */
> +   finding a path where NAME is a constant, we can thread the path.
> +   SPEED_P indicate that we could increase code size to improve the code 
> path */
>  
>  void  
> -find_jump_threads_backwards (basic_block bb)
> +find_jump_threads_backwards (basic_block bb, bool speed_p)
>  {     
>    gimple *stmt = get_gimple_control_stmt (bb);
>    if (!stmt)
> @@ -655,7 +664,8 @@ find_jump_threads_backwards (basic_block
>    hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
>  
>    max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS);
> -  fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, 
> false);
> +  fsm_find_control_statement_thread_paths (name, visited_bbs, bb_path, false,
> +                                        speed_p);
>  
>    delete visited_bbs;
>    vec_free (bb_path);
> @@ -703,7 +713,7 @@ pass_thread_jumps::execute (function *fu
>    FOR_EACH_BB_FN (bb, fun)
>      {
>        if (EDGE_COUNT (bb->succs) > 1)
> -     find_jump_threads_backwards (bb);
> +     find_jump_threads_backwards (bb, true);
>      }
>    thread_through_all_blocks (true);
>    return 0;
> @@ -716,3 +726,59 @@ make_pass_thread_jumps (gcc::context *ct
>  {
>    return new pass_thread_jumps (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_early_thread_jumps =
> +{
> +  GIMPLE_PASS,
> +  "early_thread",
> +  OPTGROUP_NONE,
> +  TV_TREE_SSA_THREAD_JUMPS,
> +  ( PROP_cfg | PROP_ssa ),
> +  0,
> +  0,
> +  0,
> +  ( TODO_cleanup_cfg | TODO_update_ssa ),
> +};
> +
> +class pass_early_thread_jumps : public gimple_opt_pass
> +{
> +public:
> +  pass_early_thread_jumps (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_early_thread_jumps, ctxt)
> +  {}
> +
> +  opt_pass * clone (void) { return new pass_early_thread_jumps (m_ctxt); }
> +  virtual bool gate (function *);
> +  virtual unsigned int execute (function *);
> +};
> +
> +bool
> +pass_early_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}
> +
> +
> +unsigned int
> +pass_early_thread_jumps::execute (function *fun)
> +{
> +  /* Try to thread each block with more than one successor.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      if (EDGE_COUNT (bb->succs) > 1)
> +     find_jump_threads_backwards (bb, false);
> +    }
> +  thread_through_all_blocks (true);
> +  return 0;
> +}
> +
> +}
> +
> +gimple_opt_pass *
> +make_pass_early_thread_jumps (gcc::context *ctxt)
> +{
> +  return new pass_early_thread_jumps (ctxt);
> +}
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Early jump threading

Reply via email to