https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87288

--- Comment #6 from bin cheng <amker at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> it's set here:
> 
>   if (!is_gimple_val (niters_vector))
>     {
>       var = create_tmp_var (type, "bnd");
>       gimple_seq stmts = NULL;
>       niters_vector = force_gimple_operand (niters_vector, &stmts, true,
> var);
>       gsi_insert_seq_on_edge_immediate (pe, stmts);
>       /* Peeling algorithm guarantees that vector loop bound is at least ONE,
>          we set range information to make niters analyzer's life easier.  */
>       if (stmts != NULL && log_vf)
>         set_range_info (niters_vector, VR_RANGE,
>                         wi::to_wide (build_int_cst (type, 1)),
>                         wi::to_wide (fold_build2 (RSHIFT_EXPR, type,
>                                                   TYPE_MAX_VALUE (type),
>                                                   log_vf)));
> 
> and the loop is
> 
>   <bb 5> [local count: 105119325]:
>   niters.0_25 = (unsigned int) n_15;
>   ni_gap.1_36 = niters.0_25 + 4294967295;
>   # RANGE [1, 2147483647] NONZERO 2147483647
>   bnd.2_37 = ni_gap.1_36 >> 1;
> 
>   <bb 4> [local count: 567644349]:
>   # ivtmp_50 = PHI <ivtmp_51(6), 0(5)>
>   ivtmp_51 = ivtmp_50 + 1;
>   if (ivtmp_51 >= bnd.2_37)
>     goto <bb 12>; [16.67%]
>   else
>     goto <bb 6>; [83.33%]
> 
>   <bb 6> [local count: 473036958]:
>   goto <bb 4>; [100.00%]
> 
> which looks good according to the comment.  So the number of iterations
> _is_ bnd.2_37 - 1 (that number may be zero).

Not really.  The code was added under assumption that vector code is guarded by
condition in peeling.

Dump for vect is like:
  <bb 2> [local count: 161061274]:
  # PT = { D.2425 } (escaped, escaped heap)
  # USE = nonlocal null { D.2425 } (escaped, escaped heap)
  # CLB = nonlocal null { D.2425 } (escaped, escaped heap)
  _10 = operator new [] (32);
  _10->_M_elems0 = 0.0;
  MEM[(struct array2 *)_10 + 16B]._M_elems0 = 0.0;
  # RANGE [-2147483648, 2147483647] NONZERO 4294967294
  n_14 = argc_13(D) * 2;
  if (n_14 <= 0)
    goto <bb 3>; [15.00%]
  else
    goto <bb 5>; [85.00%]

  <bb 3> [local count: 161061274]:
  # USE = nonlocal null { D.2425 } (escaped, escaped heap)
  # CLB = nonlocal null { D.2425 } (escaped, escaped heap)
  operator delete [] (_10);
  jacobianTransposeds ={v} {CLOBBER};
  return 0;

  <bb 5> [local count: 136902083]:
  niters.0_25 = (unsigned int) n_14;
  ni_gap.1_36 = niters.0_25 + 4294967295;
  # RANGE [1, 2147483647] NONZERO 2147483647
  bnd.2_37 = ni_gap.1_36 >> 1;

  <bb 4> [local count: 739271244]:
  # RANGE [0, 2147483647] NONZERO 2147483647
  # i_21 = PHI <i_16(6), 0(5)>
  # PT = null { D.2425 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  # vectp.5_40 = PHI <vectp.5_41(6), _10(5)>
  # PT = { D.2388 }
  # ALIGN = 16, MISALIGN = 0
  # vectp_jacobianTransposeds.9_47 = PHI <vectp_jacobianTransposeds.9_48(6),
&jacobianTransposeds(5)>
  # ivtmp_50 = PHI <ivtmp_51(6), 0(5)>
  # RANGE [0, 2147483646] NONZERO 2147483647
  _1 = (long unsigned int) i_21;
  # RANGE [0, 34359738336] NONZERO 34359738352
  _2 = _1 * 16;
  # PT = null { D.2425 } (escaped, escaped heap)
  _3 = _10 + _2;
  _6 = _1 * 8;
  # PT = { D.2388 }
  # ALIGN = 8, MISALIGN = 0
  _4 = &jacobianTransposeds + _6;
  vect__5.7_42 = MEM[(double *)vectp.5_40];
  # PT = null { D.2425 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  vectp.5_43 = vectp.5_40 + 16;
  vect__5.8_44 = MEM[(double *)vectp.5_43];
  vect_perm_even_45 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 0, 2 }>;
  vect_perm_odd_46 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 1, 3 }>;
  _5 = _3->_M_elems0;
  MEM[(double &)vectp_jacobianTransposeds.9_47] = vect_perm_even_45;
  # RANGE [1, 2147483647] NONZERO 2147483647
  i_16 = i_21 + 1;
  # PT = null { D.2425 } (escaped, escaped heap)
  vectp.5_41 = vectp.5_43 + 16;
  # PT = { D.2388 }
  # ALIGN = 16, MISALIGN = 0
  vectp_jacobianTransposeds.9_48 = vectp_jacobianTransposeds.9_47 + 16;
  ivtmp_51 = ivtmp_50 + 1;
  if (ivtmp_51 >= bnd.2_37)
    goto <bb 12>; [16.67%]
  else
    goto <bb 6>; [83.33%]

  <bb 9> [local count: 136902081]:
  goto <bb 3>; [100.00%]

  <bb 6> [local count: 616059372]:
  goto <bb 4>; [100.00%]

  <bb 12> [local count: 136902083]:
  niters_vector_mult_vf.3_38 = bnd.2_37 << 1;
  tmp.4_39 = (int) niters_vector_mult_vf.3_38;

  <bb 13> [local count: 912680552]:
  # RANGE [0, 2147483647] NONZERO 2147483647
  # i_24 = PHI <i_29(14), tmp.4_39(12)>
  # RANGE [0, 2147483646] NONZERO 2147483647
  _19 = (long unsigned int) i_24;
  # RANGE [0, 34359738336] NONZERO 34359738352
  _33 = _19 * 16;
  # PT = null { D.2425 } (escaped, escaped heap)
  _26 = _10 + _33;
  _27 = _19 * 8;
  # PT = { D.2388 }
  # ALIGN = 8, MISALIGN = 0
  _28 = &jacobianTransposeds + _27;
  _31 = _26->_M_elems0;
  MEM[(double &)_28] = _31;
  # RANGE [1, 2147483647] NONZERO 2147483647
  i_29 = i_24 + 1;
  if (n_14 <= i_29)
    goto <bb 9>; [15.00%]
  else
    goto <bb 14>; [85.00%]

  <bb 14> [local count: 775778470]:
  goto <bb 13>; [100.00%]

This is no peeling guard condition skipping vector loop anymore.  In case of
"argc_13(D) == 1", the vector loop body is executed exactly once (corresponding
2 times before vectorization); after vector loop, the epilog loop body is
executed 2 times again (for the same iteration as done in vector loop).  There
is two problems here:
A) it's at least inefficient when ("argc_13(D) == 1" && !SVE).
B) Given the vector loop is not guarded by peeling condition anymore, range
info as you noted should not be set for bnd.2_37, because it could take value
ZERO.

Change in peeling is made by revision 256635, specifically, by below code
changes:

+  poly_uint64 bound_epilog = 0;
+  if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+      && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+    bound_epilog += vf - 1;
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+    bound_epilog += 1;

//......

@@ -2577,10 +2593,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
tree nitersm1,
       if (skip_vector)
        {
          /* Additional epilogue iteration is peeled if gap exists.  */
-         bool peel_for_gaps = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
          tree t = vect_gen_scalar_loop_niters (niters_prolog, prolog_peeling,
-                                               bound_prolog,
-                                               peel_for_gaps ? vf : vf - 1,
+                                               bound_prolog, bound_epilog,

Now bound_epilog == 1 is passed into vect_gen_scalar_loop_niters, rather than
vf (== 2), this causes no peeling condition is generated.

Either below condition is too strict here or we need to identify and skip
setting range info in this case:
+  if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+      && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+    bound_epilog += vf - 1;

Thanks

Reply via email to