On Mon, Dec 1, 2014 at 10:06 PM, Jeff Law <[email protected]> wrote:
> On 11/25/14 14:16, Sebastian Pop wrote:
>>
>> Sebastian Pop wrote:
>>>
>>> >I will bootstrap and regression test this patch on x86_64-linux and
>>> >powerpc64-linux. I will also run it on our internal benchmarks,
>>> > coremark, and
>>> >the llvm test-suite.
>>> >
>>> >I will also include a longer testcase that makes sure we do not regress
>>> > on
>>> >coremark.
>>
>> Done all the above. Attached is the new patch with a new testcase. I
>> have also
>> added verify_seme inspired by the recent patch adding verify_sese.
>>
>> Sebastian
>>
>>
>> 0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch
>>
>>
>> From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001
>> From: Sebastian Pop<[email protected]>
>> Date: Fri, 26 Sep 2014 14:54:20 -0500
>> Subject: [PATCH] extend jump thread for finite state automata PR 54742
>>
>> Adapted from a patch from James Greenhalgh.
>>
>> * params.def (max-fsm-thread-path-insns, max-fsm-thread-length,
>> max-fsm-thread-paths): New.
>>
>> * doc/invoke.texi (max-fsm-thread-path-insns,
>> max-fsm-thread-length,
>> max-fsm-thread-paths): Documented.
>>
>> * tree-cfg.c (split_edge_bb_loc): Export.
>> * tree-cfg.h (split_edge_bb_loc): Declared extern.
>>
>> * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore
>> the
>> original value of cond when simplification fails.
>> (fsm_find_thread_path): New.
>> (fsm_find_control_statement_thread_paths): New.
>> (fsm_thread_through_normal_block): Call
>> find_control_statement_thread_paths.
>>
>> * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
>> EDGE_START_FSM_THREAD.
>> (verify_seme): New.
>> (duplicate_seme_region): New.
>> (thread_through_all_blocks): Generate code for
>> EDGE_START_FSM_THREAD edges
>> calling gimple_duplicate_sese_region.
>>
>> * tree-ssa-threadupdate.h (jump_thread_edge_type): Add
>> EDGE_START_FSM_THREAD.
>>
>> * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
>> * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New.
>> ---
>> gcc/doc/invoke.texi | 12 ++
>> gcc/params.def | 15 ++
>> gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 43 +++++
>> gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 127 +++++++++++++
>> gcc/tree-cfg.c | 2 +-
>> gcc/tree-cfg.h | 1 +
>> gcc/tree-ssa-threadedge.c | 215
>> +++++++++++++++++++++-
>> gcc/tree-ssa-threadupdate.c | 201
>> +++++++++++++++++++-
>> gcc/tree-ssa-threadupdate.h | 1 +
>> 9 files changed, 614 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 89edddb..074183f 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -10624,6 +10624,18 @@ large and significantly increase compile time at
>> optimization level
>> @option{-O1} and higher. This parameter is a maximum nubmer of
>> statements
>> in a single generated constructor. Default value is 5000.
>>
>> +@item max-fsm-thread-path-insns
>> +Maximum number of instructions to copy when duplicating blocks on a
>> +finite state automaton jump thread path. The default is 100.
>> +
>> +@item max-fsm-thread-length
>> +Maximum number of basic blocks on a finite state automaton jump thread
>> +path. The default is 10.
>> +
>> +@item max-fsm-thread-paths
>> +Maximum number of new jump thread paths to create for a finite state
>> +automaton. The default is 50.
>
> Has there been any tuning on these defaults. I don't have any strong
> opinions about what they ought to be, this is more to get any such
> information recorded on the lists for historical purposes.
>
> I think it's worth a note in the debug dump anytime you abort threading when
> you hit a limit.
>
> I'm a bit worried about compile-time impacts of the all the recursion, but
> I'm willing to wait and see if it turns out to be a problem in practice.
Please consider restricting it to -fexpensive-optimizations (-O2+).
Richard.
>
>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> index 8b0b7b8..c9fe212 100644
>> --- a/gcc/tree-ssa-threadedge.c
>> +++ b/gcc/tree-ssa-threadedge.c
>> @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see
>> #include "params.h"
>> #include "tree-ssa-threadedge.h"
>> #include "builtins.h"
>> +#include "cfg.h"
>> +#include "cfganal.h"
>>
>> /* To avoid code explosion due to jump threading, we limit the
>> number of statements we are going to copy. This variable
>> @@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e,
>> rather than use a relational operator. These are simpler to
>> handle. */
>> if (TREE_CODE (cond) == SSA_NAME)
>> {
>> + tree original_lhs = cond;
>> cached_lhs = cond;
>>
>> /* Get the variable's current value from the equivalence chains.
>> @@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e,
>> pass specific callback to try and simplify it further. */
>> if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
>> cached_lhs = (*simplify) (stmt, stmt);
>> +
>> + /* We couldn't find an invariant. But, callers of this
>> + function may be able to do something useful with the
>> + unmodified destination. */
>> + if (!cached_lhs)
>> + cached_lhs = original_lhs;
>> }
>> else
>> cached_lhs = NULL;
>
> Can't you just use COND rather than stuffing its value away into
> ORIGINAL_LHS? CACHED_LHS may be better in some cases if it's an SSA_NAME
> (and it should be), but I doubt it matters in practice.
>
> Or is it the case that you have to have the original condition -- without
> any context sensitive equivalences used to "simplify" the condition.
>
>
>> @@ -948,6 +957,188 @@ thread_around_empty_blocks (edge taken_edge,
>> return false;
>> }
>>
>> +/* Return true if there is at least one path from START_BB to END_BB.
>> + VISITED_BBS is used to make sure we don't fall into an infinite loop.
>> */
>> +
>> +static bool
>> +fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
>> + vec<basic_block, va_gc> *&path,
>> + hash_set<basic_block> *visited_bbs, int n_insns)
>> +{
>> + if (start_bb == end_bb)
>> + {
>> + vec_safe_push (path, start_bb);
>> + return true;
>> + }
>> +
>> + if (!visited_bbs->add (start_bb))
>> + {
>> + edge e;
>> + edge_iterator ei;
>> + FOR_EACH_EDGE (e, ei, start_bb->succs)
>> + if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs,
>> n_insns))
>> + {
>> + vec_safe_push (path, start_bb);
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>
> Update comment to indicate how PATH is used to return a path from START_BB
> to END_BB.
>
>
>
>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>> + gsi_next (&gsi))
>> + ++n_insns;
>
> Probably don't want to count labels and GIMPLE_NOPs. Probably do want to
> count non-virtual PHIs since those may end up as a copies or constant
> initializations.
>
>> + if (j == 0)
>> + kind = EDGE_START_FSM_THREAD;
>> + else if (single_pred_p (e->src))
>> + kind = EDGE_NO_COPY_SRC_BLOCK;
>> + else {
>> + kind = EDGE_COPY_SRC_JOINER_BLOCK;
>> + ++joiners;
>> + }
>
> Presumably the mis-formatting was added when you tracked the # joiners.
> AFAICT that is a write-only variable and ought to be removed. Along with
> the braces on the final ELSE which should restore proper formatting.
>
>
>> @@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool
>> may_peel_loop_headers)
>> threaded_blocks = BITMAP_ALLOC (NULL);
>> memset (&thread_stats, 0, sizeof (thread_stats));
>>
>> + for (i = 0; i < paths.length ();)
>
> Comment before this loop. I can see what you're doing, but I'm already very
> familiar with this code. Basically what are you looking for in this loop
> and what do you do?
>
> Overall I think this is very very close and I really like the overall
> direction. There's a few minor issues noted above and with those addressed,
> I think we should be ready to go.
>
> Looking further out....
>
>
> Removing most of tree-ssa-threadupdate.c and using SEME duplication would be
> a huge step forward for making this code more understandable. I look forward
> to any work you do in this space in the future.
>
> Similarly moving towards a backwards dataflow driven model is definitely on
> my long term plan for this code. Ideally with some kind of knob that says
> "optimize the trivial jump threads you can find and do so very quickly" (say
> by restricting the lookup to a single block) and a more expensive version.
>
> The simple version could run early which would solve some problems Jan has
> run into. Running the simple version early would also help DOM/VRP.
>
> Ideally I want to disentangle threading from VRP and DOM -- most threading
> opportunities are fairly simple to find and exploit. Yet right now we have
> to run DOM or VRP which are insanely expensive.
>
> Jeff