On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > Sorry for this late reply, I spent some time in understanding the problem. > > On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.ch...@arm.com> wrote: > >>> expressions. It's possible to have iv_elimination_compare_lt to do some >>> undo transformation on may_be_zero, but I found it's difficult for cases >>> involving signed/unsigned conversion like case loop-41.c. Since I think >>> there is no obvious benefit to fold may_be_zero here (somehow because the >>> two operands are already in folded forms), this patch just calls build2_loc >>> instead. >> >> But it may fold to true/false, no? > You are right, it can be folded to false in many cases. Thus I > managed to check specific folded forms of may_be_zero, as in attached > patch. So far it works for tests I added, but there are some other > folded forms not handled. > >> > >>> >>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in >>> iv_elimination_compare_lt are like below with i_1 of signed type: >>> B: i_1 + 1 >>> A: 0 >>> niter->niter: (unsigned int)i_1 >>> >>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1". >>> Without this patch, it is considered equal to each other. >> >> just looking at this part. Do you have a testcase that exhibits a difference >> when just applying patch A? So I can have a look here? >> >> From the code in iv_elimination_compare_lt I can't figure why we'd >> end up with i_1 instead of (unsigned int)i_1 as we convert to a common >> type. >> >> I suppose the issue may be that at tree_to_aff_combination time >> we strip all nops with STRIP_NOPS but when operating on ->rest >> via convert/scale or add we do not strip them again. >> >> But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above >> really doesn't look correct. >> >> Just to make sure we don't paper over an issue in tree-affine.c. >> >> Thus - testcase? On x86 we don't run into this place in >> iv_elimination_compare_lt (on an unpatched tree). >> >> CCing Zdenek for the meat of patch B. > > I am more and more convinced that check of overflow in function > iv_elimination_compare_lt is not sufficient. Considering example > given by comments just before the function: > /* Tries to replace loop exit by one formulated in terms of a LT_EXPR > comparison with CAND. NITER describes the number of iterations of > the loops. If successful, the comparison in COMP_P is altered accordingly. > > We aim to handle the following situation: > > sometype *base, *p; > int a, b, i; > > i = a; > p = p_0 = base + a; > > do > { > bla (*p); > p++; > i++; > } > while (i < b); > > Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1. > We aim to optimize this to > > p = p_0 = base + a; > do > { > bla (*p); > p++; > } > while (p < p_0 - a + b); > > This preserves the correctness, since the pointer arithmetics does not > overflow. More precisely: > > 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is > no > overflow in computing it or the values of p. > 2) if a + 1 > b, then we need to verify that the expression p_0 - a does > not > overflow. To prove this, we use the fact that p_0 = base + a. */ > > The example seems wrong to me, because the statement only stands for > unsigned case like ivopts-lt.c: > > > #include "stdint.h" > > void > f1 (char *p, uintptr_t i, uintptr_t n) > { > p += i; > do > { > *p = '\0'; > p += 1; > i++; > } > while (i < n); > } > > If i/n are of signed type, the 2nd point for "a+1>b" scenario in the > comment isn't right, because of below facts: > a) niter of loop are calculated in unsigned type as in function > number_of_iterations_lt. So loop information for this case is like: > > Analyzing # of iterations of loop 1 > exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D) > bounds on difference of bases: -4294967295 ... 4294967294 > result: > zero if i_4(D) >= n_12(D) > # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) - > 1, bounded by 4294967294 > number of iterations ((unsigned int) n_12(D) - (unsigned int) > i_4(D)) - 1; zero if i_4(D) >= n_12(D) > > b) The (unsigned int)n_12(D) could overflow, the check in > iv_elimination_compare_lt doesn't take this fact into consideration. > > Tricky thing is GCC now doesn't eliminate "i < n" with point P for > signed version of case. The reason behind is again the may_be_zero > information. The original form of may_be_zero is like "i_4(D) + 1> > n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both > i_4/n_12 are of signed type and don't wrap. In the end, function > iv_elimination_compare_lt only handles the original form of > may_be_zero. > So below case is the closest one to illustrate the problem: > > #include "stdint.h" > > void > f1 (char *p, int i, int n) > { > p += i; > do > { > *p = '\0'; > p += 1; > i++; > } > while (i < n); > } > > char arr[100] = "hello\n"; > > int main(void) > { > f1 (arr, 1, -1); > > return 0; > } > > If we build the original form of may_be_zero with change: > Index: gcc/tree-ssa-loop-niter.c > =================================================================== > --- gcc/tree-ssa-loop-niter.c (revision 211966) > +++ gcc/tree-ssa-loop-niter.c (working copy) > @@ -1153,8 +1153,13 @@ > condition. */ > > if (mpz_sgn (bnds->below) < 0) > +#if 0 > niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node, > iv1->base, iv0->base); > +#else > + niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR, > boolean_type_node, > + iv1->base, iv0->base); > +#endif > niter->niter = delta; > niter->max = widest_int::from (wi::from_mpz (niter_type, > bnds->up, false), > TYPE_SIGN (niter_type)); > > The case will be mis-compiled. > > One the other handle, iv_elimination_compare_lt for loops with > may_be_zero can be further improved to handle cases other than pointer > candidate. My original patch is intended to handle unsigned type > candidates, but there are other cases. > > The problem here is a little complicated, especially if we want to > improve iv elimination, I may mis-read the code somehow. So any idea > about this? >
BTW, for below gimple dump before ivopt: foo (int l) { long long int sum; long long int j; int i; int _8; <bb 2>: <bb 3>: # i_1 = PHI <0(2), i_4(4)> # j_2 = PHI <0(2), j_5(4)> # sum_3 = PHI <0(2), sum_6(4)> i_4 = i_1 + 1; j_5 = j_2 + 1; sum_6 = sum_3 + j_5; if (i_4 < l_7(D)) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: # sum_12 = PHI <sum_6(3)> _8 = (int) sum_12; return _8; } The loop information is like: ;; ;; Loop 1 ;; header 3, latch 4 ;; depth 1, outer 0 ;; nodes: 3 4 Processing loop 1 single exit 3 -> 5, exit condition if (i_4 < l_7(D)) Analyzing # of iterations of loop 1 exit condition [1, + , 1](no_overflow) < l_7(D) bounds on difference of bases: -2147483649 ... 2147483646 result: zero if l_7(D) < 1 # of iterations (unsigned int) l_7(D) + 4294967295, bounded by 2147483646 number of iterations (unsigned int) l_7(D) + 4294967295; zero if l_7(D) < 1 Question is, is it safe to eliminate "i_4 < l_7(D)" as below? Replacing exit test: if (i_4 < l_7(D)) foo (int l) { long long int sum; long long int j; int i; int _8; long long int _10; unsigned int _11; unsigned long _13; unsigned long _14; unsigned int _15; <bb 2>: _11 = (unsigned int) l_7(D); _15 = _11 + 4294967295; _14 = (unsigned long) _15; _13 = _14 + 1; _10 = (long long int) _13; <bb 3>: # j_2 = PHI <0(2), j_5(4)> # sum_3 = PHI <0(2), sum_6(4)> j_5 = j_2 + 1; sum_6 = sum_3 + j_5; if (j_5 < _10) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; <bb 5>: # sum_12 = PHI <sum_6(3)> _8 = (int) sum_12; return _8; } Thanks, bin