Hi,

I'm investigating how to vectorize the following simple case:

    int A[1024 * 2];
    int foo1 (unsigned offset) {
      int sum = 0;
      for (unsigned i = 0; i < 1024; i++)
        sum += A[i + offset];
      return sum;
    }

The loop body and loop vectorizer dumps are:

    # i_13 = PHI <i_9(5), 0(2)>
    # ivtmp_14 = PHI <ivtmp_12(5), 1024(2)>
    _1 = offset_7(D) + i_13;
    _2 = A[_1];
    sum_8 = _2 + sum_11;
    i_9 = i_13 + 1;
    ...
    Creating dr for A[_1]
      ...
      (chrec = (sizetype) {offset_7(D), +, 1}_1)
      (res = (sizetype) {offset_7(D), +, 1}_1)
    case1.c:7:13: missed:  failed: evolution of offset is not affine.

As SCEV thinks {offset,+,1} may overflow, it can not propagate the sizetype and
fails. The call-stack is:

    dr_analyze_innermost() -> 
        simple_iv() -> ... ->
            convert_affine_scev() -> 
                scev_probably_wraps_p() -> 
                    scev_var_range_cant_overflow()
                    loop_exits_before_overflow()


BTW. If we add a stmt like "if (offset <= 1024)" before the loop, GCC can
vectorized it successfully, as scev_var_range_cant_overflow() knows the range
of _1.

For the original case, I think GCC is able to infer the range from the array
length and do the vectorization. There is already functions to infer this info
from undefined behavior like array-ref and record nonwrappiong IVs. We can see
the dumps of passes like evrp, cunroll, vrp1, etc:

    Induction variable (unsigned int) offset_8(D) + 1 * iteration does not wrap 
in statement _2 = A[_1];
    in loop 1.
    Statement _2 = A[_1];
     is executed at most 2047 (bounded by 2047) + 1 times in loop 1.

The call-stack is:

    estimate_numbers_of_iterations() ->
        infer_loop_bounds_from_undefined() ->
            infer_loop_bounds_from_array() -> ... ->
                record_nonwrapping_iv()


So, I have two questions:

1. is it legal to vectorize such case by inferring the no wrap info from array
   ref (I'm wondering if there is any corner case that can not do)?
2. If the 1st question is true, then how could we implement this in GCC?

For the 2nd question, I think there may be several possible solutions:
- Could we re-use the existing nonwrapping IV information found by niter?
- Or, could we implement a function called infer_nowrapping_from_array() (like
  infer_loop_bounds_from_array)? For this way, there are also different possible
  places to call it:
  - in the very beginning (i.e., dr_analyze_innermost()), where we knows it is
    analyzing an array-ref A[_1].
  - in scev_probably_wraps_p(), where it doesn't know it's analyzing an array
    subscript, so we have to find if the user of _1 is array-ref (i.e., A[_1]).

As SCEV is the fundamental pass for loop analysis, I think we'd be very careful
about the correctness. Do you have any comments?

Thanks!
-Hao

Reply via email to