On Sat, Oct 19, 2019 at 2:27 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> As mentioned in the PR, the following patch attempts to address two issues.
> In remove_unused_ivs we first find the best iv_cand (we prefer primarily the
> same step, next same mode and lastly constant base) and only then call
> get_computation_at to determine the replacement expression.  Unfortunately,
> in various cases get_computation_at can return NULL_TREE and in that case we
> don't try any other candidate and just leave the vars for debug as is, which
> results in => NULL and the IVs <optimized away>.
>
> The following patch will only consider candidates for which
> get_computation_at succeeds, while it can be slower, it can handle more
> cases.  Perhaps alternative would be to have two passes, pick up the best
> candidate without calling get_computation_at, calling it on the best
> candidate and if that fails, retry the best candidate search with calling
> get_computation_at.
>
> Another thing is that get_computation_at can only handle cases where
> use can be expressed as ubase + (var - cbase) * ratio for integral ratio.
> In the PR testcase we don't have any, one (the one we pick as best)
> has a ratio 1/15 and the other 1/4.  Using a division in ivopts at runtime
> is hardly ever desirable, but for debug info we don't mind that, all we need
> to ensure is that we don't result in wrong-debug.
> The patch implements expressing use as ubase + (var - cbase) / ratio for
> these debug info only uses, but so far requires that if the use IV couldn't
> wrap around, that the candidate (var - cbase) can't wrap around either,
> which should be true whenever the candidate type is IMHO at least
> ceil_log2 (ratio) bits larger than the use type.  Do I need to punt if
> !use->iv->no_overflow, or is
> ubase + (utype) ((var - cbase) / ratio) computation safe even if it wraps?
> And as questioned in the PR, are there other cases where we can safely
> assume no wrap (e.g. use it if use->iv->no_overflow && cand->iv->no_overflow
> without those extra precision checks)?
>
> Anyway, bootstrapped/regtested successfully on x86_64-linux and i686-linux.
Hi Jakub, thanks very much for working on this.

As for choosing the best candidate, I was thinking to reuse existing
get_computation_cost interface, but looks like it requires non-trivial
modification.  Your patch is simpler.

The overflow check is difficult, IIUC, checks on bit precision in the
patch is not enough?  Considering an unsigned 64-bit candidate with
unknown compilation start value.  My original idea was using
no_overflow flag, apparently this information is not well computed and
inaccurate in most cases?  At last, for now there is no link between
candidate and its original iv.  IIRC, when a candidate is derived from
a no_overflow pointer candidate, it's not safe to mark the candidate
as no_overflow.  I am not sure if I remember wrong, unfortunately, I
don't have examples either.

Thanks,
bin
>
> Comparing bootstrapped cc1plus without and with this patch (the former with
> the patch applied after bootstrap and stage3 rebuilt, so that .text etc. is
> identical) shows:
> -  [33] .debug_info       PROGBITS        0000000000000000 22a4788 749275e 00 
>      0   0  1
> -  [34] .debug_abbrev     PROGBITS        0000000000000000 9736ee6 204aad 00  
>     0   0  1
> -  [35] .debug_line       PROGBITS        0000000000000000 993b993 1688464 00 
>      0   0  1
> -  [36] .debug_str        PROGBITS        0000000000000000 afc3df7 6f65aa 01  
> MS  0   0  1
> -  [37] .debug_loc        PROGBITS        0000000000000000 b6ba3a1 71a2dde 00 
>      0   0  1
> -  [38] .debug_ranges     PROGBITS        0000000000000000 1285d17f 16414d0 
> 00      0   0  1
> -  [39] .symtab           SYMTAB          0000000000000000 13e9e650 166ff8 18 
>     40 38193  8
> -  [40] .strtab           STRTAB          0000000000000000 14005648 2ad809 00 
>      0   0  1
> -  [41] .shstrtab         STRTAB          0000000000000000 142b2e51 0001a0 00 
>      0   0  1
> +  [33] .debug_info       PROGBITS        0000000000000000 22a4788 749365e 00 
>      0   0  1
> +  [34] .debug_abbrev     PROGBITS        0000000000000000 9737de6 204a9f 00  
>     0   0  1
> +  [35] .debug_line       PROGBITS        0000000000000000 993c885 1688f0c 00 
>      0   0  1
> +  [36] .debug_str        PROGBITS        0000000000000000 afc5791 6f65aa 01  
> MS  0   0  1
> +  [37] .debug_loc        PROGBITS        0000000000000000 b6bbd3b 71cd404 00 
>      0   0  1
> +  [38] .debug_ranges     PROGBITS        0000000000000000 1288913f 16414b0 
> 00      0   0  1
> +  [39] .symtab           SYMTAB          0000000000000000 13eca5f0 166ff8 18 
>     40 38193  8
> +  [40] .strtab           STRTAB          0000000000000000 140315e8 2ad809 00 
>      0   0  1
> +  [41] .shstrtab         STRTAB          0000000000000000 142dedf1 0001a0 00 
>      0   0  1
> so .debug_info is 3840 bytes larger and .debug_loc is 173606 bytes larger
> (0.15%), so there are some changes with this, but not huge amount of them,
> though .debug_loc size changed in 217 gcc/*.o files out of 474.
>
> 2019-10-18  Jakub Jelinek  <ja...@redhat.com>
>
>         PR debug/90231
>         * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
>         (remove_unused_ivs): Use it instead of get_computation_at.  When
>         choosing best candidate, only consider candidates where
>         get_debug_computation_at actually returns non-NULL.
>
> --- gcc/tree-ssa-loop-ivopts.c.jj       2019-09-20 12:25:26.810718338 +0200
> +++ gcc/tree-ssa-loop-ivopts.c  2019-10-18 23:55:10.054026219 +0200
> @@ -4089,6 +4089,83 @@ get_computation_at (class loop *loop, gi
>    return fold_convert (type, aff_combination_to_tree (&aff));
>  }
>
> +/* Like get_computation_at, but try harder, even if the computation
> +   is more expensive.  Intended for debug stmts.  */
> +
> +static tree
> +get_debug_computation_at (class loop *loop, gimple *at,
> +                         struct iv_use *use, struct iv_cand *cand)
> +{
> +  if (tree ret = get_computation_at (loop, at, use, cand))
> +    return ret;
> +
> +  tree ubase = use->iv->base, ustep = use->iv->step;
> +  tree cbase = cand->iv->base, cstep = cand->iv->step;
> +  tree var;
> +  tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
> +  widest_int rat;
> +
> +  /* We must have a precision to express the values of use.  */
> +  if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
> +    return NULL_TREE;
> +
> +  /* Try to handle the case that get_computation_at doesn't,
> +     try to express
> +     use = ubase + (var - cbase) / ratio.  */
> +  if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
> +                            &rat))
> +    return NULL_TREE;
> +
> +  bool neg_p = false;
> +  if (wi::neg_p (rat))
> +    {
> +      if (TYPE_UNSIGNED (ctype))
> +       return NULL_TREE;
> +      neg_p = true;
> +      rat = wi::neg (rat);
> +    }
> +
> +  int bits = wi::exact_log2 (rat);
> +  if (bits == -1)
> +    bits = wi::floor_log2 (rat) + 1;
> +  if (TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
> +    return NULL_TREE;
> +
> +  var = var_at_stmt (loop, cand, at);
> +
> +  if (POINTER_TYPE_P (ctype))
> +    {
> +      ctype = unsigned_type_for (ctype);
> +      cbase = fold_convert (ctype, cbase);
> +      cstep = fold_convert (ctype, cstep);
> +      var = fold_convert (ctype, var);
> +    }
> +
> +  ubase = unshare_expr (ubase);
> +  cbase = unshare_expr (cbase);
> +  if (stmt_after_increment (loop, cand, at))
> +    var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
> +                      unshare_expr (cstep));
> +
> +  var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
> +  var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
> +                    wide_int_to_tree (TREE_TYPE (var), rat));
> +  if (POINTER_TYPE_P (utype))
> +    {
> +      var = fold_convert (sizetype, var);
> +      if (neg_p)
> +       var = fold_build1 (NEGATE_EXPR, sizetype, var);
> +      var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
> +    }
> +  else
> +    {
> +      var = fold_convert (utype, var);
> +      var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
> +                        ubase, var);
> +    }
> +  return var;
> +}
> +
>  /* Adjust the cost COST for being in loop setup rather than loop body.
>     If we're optimizing for space, the loop setup overhead is constant;
>     if we're optimizing for speed, amortize it over the per-iteration cost.
> @@ -7523,6 +7600,7 @@ remove_unused_ivs (struct ivopts_data *d
>               struct iv_use dummy_use;
>               struct iv_cand *best_cand = NULL, *cand;
>               unsigned i, best_pref = 0, cand_pref;
> +             tree comp = NULL_TREE;
>
>               memset (&dummy_use, 0, sizeof (dummy_use));
>               dummy_use.iv = info->iv;
> @@ -7543,20 +7621,22 @@ remove_unused_ivs (struct ivopts_data *d
>                     ? 1 : 0;
>                   if (best_cand == NULL || best_pref < cand_pref)
>                     {
> -                     best_cand = cand;
> -                     best_pref = cand_pref;
> +                     tree this_comp
> +                       = get_debug_computation_at (data->current_loop,
> +                                                   SSA_NAME_DEF_STMT (def),
> +                                                   &dummy_use, cand);
> +                     if (this_comp)
> +                       {
> +                         best_cand = cand;
> +                         best_pref = cand_pref;
> +                         comp = this_comp;
> +                       }
>                     }
>                 }
>
>               if (!best_cand)
>                 continue;
>
> -             tree comp = get_computation_at (data->current_loop,
> -                                             SSA_NAME_DEF_STMT (def),
> -                                             &dummy_use, best_cand);
> -             if (!comp)
> -               continue;
> -
>               if (count > 1)
>                 {
>                   tree vexpr = make_node (DEBUG_EXPR_DECL);
>
>         Jakub

Reply via email to