Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Richard Sandiford Mon, 03 Jul 2017 01:08:14 -0700

Richard Biener <rguent...@suse.de> writes:
> On Mon, 12 Jun 2017, Bin.Cheng wrote:
>> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>> > "Bin.Cheng" <amker.ch...@gmail.com> writes:
>> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
>> >> <richard.sandif...@linaro.org> wrote:
>> >>> Sorry to return this old patch, but:
>> >>>
>> >>> Bin Cheng <bin.ch...@arm.com> writes:
>> >>>> -/* Calculate the number of iterations under which scalar loop will be
>> >>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>> >>>> -   iterations of prolog loop.  If it's integer const, the integer
>> >>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>> >>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>> >>>> -   true.  This function also store upper bound of the result in BOUND. 
>> >>>>  */
>> >>>> +/* Calculate the number of iterations above which vectorized loop will 
>> >>>> be
>> >>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of 
>> >>>> iterations
>> >>>> +   of prolog loop.  If it's integer const, the integer number is also 
>> >>>> passed
>> >>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>> >>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus 
>> >>>> one.
>> >>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which 
>> >>>> scalar
>> >>>> +   (rather than vectorized) loop will be executed.  This function 
>> >>>> stores
>> >>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>> >>>>
>> >>>>  static tree
>> >>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>> >>>> -                          int bound_prolog, int vf, int th, int *bound,
>> >>>> -                          bool check_profitability)
>> >>>> +                          int bound_prolog, int vfm1, int th,
>> >>>> +                          int *bound_scalar, bool check_profitability)
>> >>>>  {
>> >>>>    tree type = TREE_TYPE (niters_prolog);
>> >>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>> >>>> -                          build_int_cst (type, vf));
>> >>>> +                          build_int_cst (type, vfm1));
>> >>>>
>> >>>> -  *bound = vf + bound_prolog;
>> >>>> +  *bound_scalar = vfm1 + bound_prolog;
>> >>>>    if (check_profitability)
>> >>>>      {
>> >>>> -      th++;
>> >>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>> >>>> +      compute the maximum niters of scalar loop.  */
>> >>>> +      th--;
>> >>>
>> >>> Are you sure about this last change?  It looks like it should be dropping
>> >> Hi Richard,
>> >> Thanks for spotting this.  I vaguely remember I got this from the way
>> >> how niter/th was checked in previous peeling code, but did't double
>> >> check it now.  I tend to believe there is inconsistence about th,
>> >> especially with comment like:
>> >>
>> >>   /* Threshold of number of iterations below which vectorzation will not 
>> >> be
>> >>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>> >>      PARAM_MIN_VECT_LOOP_BOUND. */
>> >>   unsigned int th;
>> >>
>> >> I also tend to believe the inconsistence was introduced partly because
>> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
>> >> in rest of the compiler.
>> >>
>> >> and...,
>> >>
>> >>> the increment rather than replacing it with a decrement.
>> >>>
>> >>> It looks like the threshold is already the maximum niters for the scalar
>> >>> loop.  It's set by:
>> >>>
>> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>> >>>                             * vectorization_factor) - 1);
>> >>>
>> >>>   /* Use the cost model only if it is more conservative than user 
>> >>> specified
>> >>>      threshold.  */
>> >>>   th = (unsigned) min_scalar_loop_bound;
>> >>>   if (min_profitable_iters
>> >>>       && (!min_scalar_loop_bound
>> >>>           || min_profitable_iters > min_scalar_loop_bound))
>> >>>     th = (unsigned) min_profitable_iters;
>> >>>
>> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>> >>>
>> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>> >>> is the minimum niters for the vector loop.)
>> >> To be honest, min_scalar_loop_bound is more likely for something's
>> >> lower bound which is the niters for the vector loop.  If it refers to
>> >> the niters scalar loop, it is in actuality the "max" value we should
>> >> use.  I am not quite sure here, partly because I am not a native
>> >> speaker.
>> >>
>> >>>
>> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
>> >>> instead returns the largest niters for which the scalar loop should be
>> >>> preferred:
>> >>>
>> >>>   /* Cost model disabled.  */
>> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>> >>>     {
>> >>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model 
>> >>> disabled.\n");
>> >>>       *ret_min_profitable_niters = 0;
>> >>>       *ret_min_profitable_estimate = 0;
>> >>>       return;
>> >>>     }
>> >>>   [...]
>> >>>   /* Because the condition we create is:
>> >>>      if (niters <= min_profitable_iters)
>> >>>        then skip the vectorized loop.  */
>> >>>   min_profitable_iters--;
>> >>>   [...]
>> >>>   min_profitable_estimate --;
>> >>>
>> >>> Also, when deciding whether the threshold needs to be used, we have:
>> >>>
>> >>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>> >>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>> >>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> >>>     {
>> >>>       if (dump_enabled_p ())
>> >>>         dump_printf_loc (MSG_NOTE, vect_location,
>> >>>                          "Profitability threshold is %d loop 
>> >>> iterations.\n",
>> >>>                          th);
>> >>>       check_profitability = true;
>> >>>     }
>> >> Yes, this indeed indicates that th refers to the maximum niters of the
>> >> scalar loop.  Anyway, we should resolve the inconsistence and make it
>> >> more clear in variable names etc..
>> >
>> > Yeah, agree the variable names are really confusing.  Given the choice
>> > between changing the names to match the current meaning and changing
>> > the meaning to match the current names, the latter looks like it would
>> > give cleaner code.  I'm happy to do that it that sounds like the way
>> > to go.
>> I am all for this direction, but Richard B may have some to say here.
>> BTW, will you also fix the other inconsistence you found?
>
> I'm all to making the code clearer but I guess it will remain a
> tricky area.  Each time I need to change sth there I need half an
> hour to recap how all the stuff is tied together...
>
> So please go forward.  Many bonus points if you manage to
> create testcases covering the mistakes you fix.


No testcases, sorry, but I built the patched compiler for
aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
powerpc64le-linux-gnu, compiled the testsuite with -O3,
and compared the result with:

diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c
--- gcc/tree-vect-loop.c        2017-07-01 16:18:55.951581540 +0100
+++ /hdd/richards/compare/old/gcc/tree-vect-loop.c      2017-07-01 
16:00:20.266729964 +0100
@@ -2247,8 +2247,10 @@
       /* Niters for at least one iteration of vectorized loop.  */
       niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       /* One additional iteration because of peeling for gap.  */
-      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
        niters_th++;
+      /* Convert to maximum number of scalar iters.  */
+      niters_th--;
       if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
        LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
     }
diff -ur gcc/tree-vect-loop-manip.c 
/hdd/richards/compare/old/gcc/tree-vect-loop-manip.c
--- gcc/tree-vect-loop-manip.c  2017-07-01 16:18:55.951581540 +0100
+++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c        2017-07-01 
14:00:26.775542411 +0100
@@ -1143,9 +1143,6 @@
   *bound_scalar = vfm1 + bound_prolog;
   if (check_profitability)
     {
-      /* TH indicates the minimum niters of vectorized loop, while we
-        compute the maximum niters of scalar loop.  */
-      th--;
       /* Peeling for constant times.  */
       if (int_niters_prolog >= 0)
        {

which is the patch that would make everything consistent with the
current meaning (where the "minimum number of iterations" for which
vectorisation was profitable were actually the maximum number of
iterations for which vectorisation wasn't profitable).

Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu.
OK to install?

Richard


2017-07-03  Richard Sandiford  <richard.sandif...@linaro.org>

gcc/
        * tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound,
        min_profitable_iters, and th as inclusive lower bounds.
        Fix LOOP_VINFO_PEELING_FOR_GAPS condition.
        (vect_estimate_min_profitable_iters): Return inclusive lower bounds
        for min_profitable_iters and min_profitable_estimate.
        (vect_transform_loop): Treat th as an inclusive lower bound.
        * tree-vect-loop-manip.c (vect_loop_versioning): Likewise.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c        2017-07-03 08:42:51.149380545 +0100
+++ gcc/tree-vect-loop.c        2017-07-03 08:50:43.507370205 +0100
@@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_
       goto again;
     }
 
-  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
-                           * vectorization_factor) - 1);
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
+                          * vectorization_factor);
 
   /* Use the cost model only if it is more conservative than user specified
      threshold.  */
-  th = (unsigned) min_scalar_loop_bound;
-  if (min_profitable_iters
-      && (!min_scalar_loop_bound
-          || min_profitable_iters > min_scalar_loop_bound))
-    th = (unsigned) min_profitable_iters;
+  th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
 
   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
+      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
     {
       if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
     estimated_niter = max_niter;
   if (estimated_niter != -1
       && ((unsigned HOST_WIDE_INT) estimated_niter
-          <= MAX (th, (unsigned)min_profitable_estimate)))
+          < MAX (th, (unsigned) min_profitable_estimate)))
     {
       if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
 
   /* Decide whether we need to create an epilogue loop to handle
      remaining scalar iterations.  */
-  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
-        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
-       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
+        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+       * LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
@@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
       /* Niters for at least one iteration of vectorized loop.  */
       niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       /* One additional iteration because of peeling for gap.  */
-      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
        niters_th++;
       if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
        LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
@@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop
   if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
     {
       if (vec_outside_cost <= 0)
-        min_profitable_iters = 1;
+        min_profitable_iters = 0;
       else
         {
           min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
@@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop
   min_profitable_iters =
        min_profitable_iters < vf ? vf : min_profitable_iters;
 
-  /* Because the condition we create is:
-     if (niters <= min_profitable_iters)
-       then skip the vectorized loop.  */
-  min_profitable_iters--;
-
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
                      "  Runtime profitability threshold = %d\n",
@@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop
      SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC  */
 
   if (vec_outside_cost <= 0)
-    min_profitable_estimate = 1;
+    min_profitable_estimate = 0;
   else
     {
       min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
@@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop
                                 / ((scalar_single_iter_cost * vf)
                                   - vec_inside_cost);
     }
-  min_profitable_estimate --;
   min_profitable_estimate = MAX (min_profitable_estimate, 
min_profitable_iters);
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_
      run at least the vectorization factor number of times checking
      is pointless, too.  */
   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
+  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       if (dump_enabled_p ())
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c  2017-07-03 08:42:46.863565062 +0100
+++ gcc/tree-vect-loop-manip.c  2017-07-03 08:50:43.506370248 +0100
@@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
 
   if (check_profitability)
-    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
+    cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
                             build_int_cst (TREE_TYPE (scalar_loop_iters),
                                                       th));

Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

Reply via email to