Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Richard Sandiford Sat, 10 Jun 2017 02:41:55 -0700

Sorry to return this old patch, but:

Bin Cheng <bin.ch...@arm.com> writes:
> -/* Calculate the number of iterations under which scalar loop will be
> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> -   iterations of prolog loop.  If it's integer const, the integer
> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> -   true.  This function also store upper bound of the result in BOUND.  */
> +/* Calculate the number of iterations above which vectorized loop will be
> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
> +   of prolog loop.  If it's integer const, the integer number is also passed
> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> +   (rather than vectorized) loop will be executed.  This function stores
> +   upper bound (included) of the result in BOUND_SCALAR.  */
>  
>  static tree
>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> -                          int bound_prolog, int vf, int th, int *bound,
> -                          bool check_profitability)
> +                          int bound_prolog, int vfm1, int th,
> +                          int *bound_scalar, bool check_profitability)
>  {
>    tree type = TREE_TYPE (niters_prolog);
>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> -                          build_int_cst (type, vf));
> +                          build_int_cst (type, vfm1));
>  
> -  *bound = vf + bound_prolog;
> +  *bound_scalar = vfm1 + bound_prolog;
>    if (check_profitability)
>      {
> -      th++;
> +      /* TH indicates the minimum niters of vectorized loop, while we
> +      compute the maximum niters of scalar loop.  */
> +      th--;


Are you sure about this last change?  It looks like it should be dropping
the increment rather than replacing it with a decrement.

It looks like the threshold is already the maximum niters for the scalar
loop.  It's set by:

  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
                            * vectorization_factor) - 1);

  /* Use the cost model only if it is more conservative than user specified
     threshold.  */
  th = (unsigned) min_scalar_loop_bound;
  if (min_profitable_iters
      && (!min_scalar_loop_bound
          || min_profitable_iters > min_scalar_loop_bound))
    th = (unsigned) min_profitable_iters;

  LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;

(Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
is the minimum niters for the vector loop.)

min_profitable_iters sounds like it _ought_ to be the minimum niters for
which the vector loop is used, but vect_estimate_min_profitable_iters
instead returns the largest niters for which the scalar loop should be
preferred:

  /* Cost model disabled.  */
  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
    {
      dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
      *ret_min_profitable_niters = 0;
      *ret_min_profitable_estimate = 0;
      return;
    }
  [...]
  /* Because the condition we create is:
     if (niters <= min_profitable_iters)
       then skip the vectorized loop.  */
  min_profitable_iters--;
  [...]
  min_profitable_estimate --;

Also, when deciding whether the threshold needs to be used, we have:

  th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                         "Profitability threshold is %d loop iterations.\n",
                         th);
      check_profitability = true;
    }

where again the "- 1" seems to assume that the threshold is already the
maximum niters for the scalar loop.  (Although checking for == seems a bit
pointless too.)

The corresponding profitability check for loop versioning still assumes
that th is the maximum niters for the scalar loop:

  if (check_profitability)
    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
                             build_int_cst (TREE_TYPE (scalar_loop_iters),
                                                       th));

since we use GT_EXPR instead of GE_EXPR.

Thanks,
Richard

Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Reply via email to