Hi Richard,

Thanks for the comments!
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 298ab215530..7016f993339 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1605,6 +1605,14 @@ pass_complete_unroll::execute (function *fun)
>      peeled_loops = BITMAP_ALLOC (NULL);
>    unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
>                                                    true);
> +
> +  /* There are no loops after unrolling, we assume that it's not so costly
> +     to do the scalar cleanup since here.  FIXME: Some heuristics can be
> +     further added to guard the cost level, like nodes number total, all
> +     the original loops should be with single exits, etc.  */
> +  if (!current_loops->tree_root->inner)
> +    val |= TODO_force_next_scalar_cleanup;
> +
> 
> so this is not the appropriate way to guard this.  Instead in
> 
> static unsigned int
> tree_unroll_loops_completely (bool may_increase_size, bool unroll_outer)
> {
> 
> look where we do
> 
>           bitmap fathers = BITMAP_ALLOC (NULL);
>           EXECUTE_IF_SET_IN_BITMAP (father_bbs, 0, i, bi)
>             {
>               basic_block unrolled_loop_bb = BASIC_BLOCK_FOR_FN (cfun, i);
>               if (! unrolled_loop_bb)
>                 continue;
>               if (loop_outer (unrolled_loop_bb->loop_father))
>                 bitmap_set_bit (fathers,
>                                 unrolled_loop_bb->loop_father->num);
> 
> and in the else case return TODO_force_next_scalar_cleanup because
> then we know we have unrolled an outermost loop and not ran VN
> immediately on it.

OK, I'll move it there with:
  1) set one bool var once we have any outermost loop unrolled when iterating.
  2) after iterating, check whether no loops, flag TODO if yes.

I had one question that we will have some heuristics here to guard this cleanup
to execute or not, then do we want SEME VN to run on each outermost unrolled
loop with some pre-record info?  Not sure whether it's worthy, though it can
help a bit when the cleanup isn't triggered.

> 
> +/* Scalar cleanup, it has several gated cleanup passes like FRE, DSE.  */
> +
> +namespace {
> +
> +const pass_data pass_data_scalar_cleanup =
> +{
> +  GIMPLE_PASS, /* type */
> +  "*scalar_cleanup", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> 
> this new "pass" doesn't have to do anything with tree-ssa-loop-ivcanon.c
> so please add it to passes.c instead (there's already a bunch of
> pass definitions in there).

Will fix.

> 
> Can you repeat the compile-time measurement there?  I also wonder
> whether we should worry about compile-time at -O[12] when SLP is not run.
> Thus, probably rename the cleanup pass to pre_slp_scalar_cleanup and
> gate it on && flag_slp_vectorize

Good idea, will evaluate it.

> 
> Note this is probably the cleanest way to implement this hack.  But it
> still is what it is - a hack.  Not a proper fix for whatever the actual issue 
> is
> which means I'm not the one that's going to ack it (since I've suggested it).
> 

Got it.  Thanks for the suggestion again!  :-)

BR,
Kewen

Reply via email to