On 06/06/2016 04:19 AM, Jan Hubicka wrote:
Hi,
while looking into profile mismatches introduced by the backward threading pass
I noticed that the heuristics seems quite simplistics. First it should be
profile sensitive and disallow duplication when optimizing cold paths. Second
it should use estimate_num_insns because gimple statement count is not really
very realistic estimate of final code size effect and third there seems to be
no reason to disable the pass for functions optimized for size.
I've never been happy with the size estimation code -- it's a series of
hacks to address a couple pathological codesize regressions that were
seen when comparing gcc-6 to prior versions. I won't lose any sleep if
we move to estimate_num_insns and rely more on profile data.
If we block duplication for more than 1 insns for size optimized paths the pass
is able to do majority of threading decisions that are for free and improve
codegen.
The code size benefit was between 0.5% to 2.7% on testcases I tried (tramp3d,
GCC modules, xlanancbmk and some other stuff around my hd).
Bootstrapped/regtested x86_64-linux, seems sane?
The pass should also avoid calling cleanup_cfg when no trheading was done
and i do not see why it is guarded by expensive_optimizations. What are the
main compile time complexity limitations?
The pass essentially walks up the use-def links in the CFG. WHen it
encounters a PHI, we walk up every PHI argument. That's where it gets
expensive. I think there's a better algorithm to do those walks that
doesn't start at the beginning each time, but I haven't had time to
experiment with coding it up.
Honza
* tree-ssa-threadbackward.c: Include tree-inline.h
(profitable_jump_thread_path): Use estimate_num_insns to estimate
size of copied block; for cold paths reduce duplication.
(find_jump_threads_backwards): Remove redundant tests.
(pass_thread_jumps::gate): Enable for -Os.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c (revision 237101)
+++ tree-ssa-threadbackward.c (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
#include "tree-pass.h"
#include "gimple-ssa.h"
#include "tree-phinodes.h"
+#include "tree-inline.h"
Probably an indication we want estimate_num_insns broken out into a more
generic place that could be used by the threader, inlining, etc. Please
consider that as a follow-up.
static int max_threaded_paths;
@@ -210,7 +211,7 @@ profitable_jump_thread_path (vec<basic_b
&& !(gimple_code (stmt) == GIMPLE_ASSIGN
&& gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
&& !is_gimple_debug (stmt))
- ++n_insns;
+ n_insns += estimate_num_insns (stmt, &eni_size_weights);
}
/* We do not look at the block with the threaded branch
@@ -238,13 +239,15 @@ profitable_jump_thread_path (vec<basic_b
threaded_through_latch = true;
}
+ gimple *stmt = get_gimple_control_stmt ((*path)[0]);
+ gcc_assert (stmt);
+
/* We are going to remove the control statement at the end of the
last block in the threading path. So don't count it against our
statement count. */
- n_insns--;
- gimple *stmt = get_gimple_control_stmt ((*path)[0]);
- gcc_assert (stmt);
+ n_insns-= estimate_num_insns (stmt, &eni_size_weights);
Whitespace nit here before the "-=".
I think this is fine with the whitespace fix. And again, consider
moving estimate_num_insns to a new location outside the inliner.
jeff