Sorry for this late reply, I spent some time in understanding the problem. On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>> expressions. It's possible to have iv_elimination_compare_lt to do some >> undo transformation on may_be_zero, but I found it's difficult for cases >> involving signed/unsigned conversion like case loop-41.c. Since I think >> there is no obvious benefit to fold may_be_zero here (somehow because the >> two operands are already in folded forms), this patch just calls build2_loc >> instead. > > But it may fold to true/false, no? You are right, it can be folded to false in many cases. Thus I managed to check specific folded forms of may_be_zero, as in attached patch. So far it works for tests I added, but there are some other folded forms not handled. > >> >> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in >> iv_elimination_compare_lt are like below with i_1 of signed type: >> B: i_1 + 1 >> A: 0 >> niter->niter: (unsigned int)i_1 >> >> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1". >> Without this patch, it is considered equal to each other. > > just looking at this part. Do you have a testcase that exhibits a difference > when just applying patch A? So I can have a look here? > > From the code in iv_elimination_compare_lt I can't figure why we'd > end up with i_1 instead of (unsigned int)i_1 as we convert to a common > type. > > I suppose the issue may be that at tree_to_aff_combination time > we strip all nops with STRIP_NOPS but when operating on ->rest > via convert/scale or add we do not strip them again. > > But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above > really doesn't look correct. > > Just to make sure we don't paper over an issue in tree-affine.c. > > Thus - testcase? On x86 we don't run into this place in > iv_elimination_compare_lt (on an unpatched tree). > > CCing Zdenek for the meat of patch B. I am more and more convinced that check of overflow in function iv_elimination_compare_lt is not sufficient. Considering example given by comments just before the function: /* Tries to replace loop exit by one formulated in terms of a LT_EXPR comparison with CAND. NITER describes the number of iterations of the loops. If successful, the comparison in COMP_P is altered accordingly. We aim to handle the following situation: sometype *base, *p; int a, b, i; i = a; p = p_0 = base + a; do { bla (*p); p++; i++; } while (i < b); Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1. We aim to optimize this to p = p_0 = base + a; do { bla (*p); p++; } while (p < p_0 - a + b); This preserves the correctness, since the pointer arithmetics does not overflow. More precisely: 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no overflow in computing it or the values of p. 2) if a + 1 > b, then we need to verify that the expression p_0 - a does not overflow. To prove this, we use the fact that p_0 = base + a. */ The example seems wrong to me, because the statement only stands for unsigned case like ivopts-lt.c: #include "stdint.h" void f1 (char *p, uintptr_t i, uintptr_t n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } If i/n are of signed type, the 2nd point for "a+1>b" scenario in the comment isn't right, because of below facts: a) niter of loop are calculated in unsigned type as in function number_of_iterations_lt. So loop information for this case is like: Analyzing # of iterations of loop 1 exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D) bounds on difference of bases: -4294967295 ... 4294967294 result: zero if i_4(D) >= n_12(D) # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) - 1, bounded by 4294967294 number of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) - 1; zero if i_4(D) >= n_12(D) b) The (unsigned int)n_12(D) could overflow, the check in iv_elimination_compare_lt doesn't take this fact into consideration. Tricky thing is GCC now doesn't eliminate "i < n" with point P for signed version of case. The reason behind is again the may_be_zero information. The original form of may_be_zero is like "i_4(D) + 1> n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both i_4/n_12 are of signed type and don't wrap. In the end, function iv_elimination_compare_lt only handles the original form of may_be_zero. So below case is the closest one to illustrate the problem: #include "stdint.h" void f1 (char *p, int i, int n) { p += i; do { *p = '\0'; p += 1; i++; } while (i < n); } char arr[100] = "hello\n"; int main(void) { f1 (arr, 1, -1); return 0; } If we build the original form of may_be_zero with change: Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 211966) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -1153,8 +1153,13 @@ condition. */ if (mpz_sgn (bnds->below) < 0) +#if 0 niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node, iv1->base, iv0->base); +#else + niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR, boolean_type_node, + iv1->base, iv0->base); +#endif niter->niter = delta; niter->max = widest_int::from (wi::from_mpz (niter_type, bnds->up, false), TYPE_SIGN (niter_type)); The case will be mis-compiled. One the other handle, iv_elimination_compare_lt for loops with may_be_zero can be further improved to handle cases other than pointer candidate. My original patch is intended to handle unsigned type candidates, but there are other cases. The problem here is a little complicated, especially if we want to improve iv elimination, I may mis-read the code somehow. So any idea about this? Thanks, bin
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-41.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-41.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-41.c (revision 0) @@ -0,0 +1,38 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +typedef long unsigned int size_t; +extern float a[100], b[100]; + +int foo (int M, int l) +{ + unsigned ivtmp = 0, niters, _37, _38, bnd; + size_t _67, _1; + float *vectp_a, *vectp_b, *vectp_a2; + float vect__6, vect__7, vect__8; + + _38 = (unsigned int)l; + bnd = _38 + 1; + + _1 = (size_t) M; + _67 = _1 * 4; + vectp_a = a; vectp_b = b; vectp_a2 = a + _67; + + do + { + vect__6 = *vectp_a; + vect__7 = *vectp_b; + vect__8 = vect__6 + vect__7; + *vectp_a = vect__8; + vectp_a = vectp_a + 4; + vectp_b = vectp_b + 4; + vectp_a2 = vectp_a2 + 4; + ivtmp += 1; + } + while (ivtmp < bnd); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(bnd.*\\)" 1 "ivopts" } } */ +/* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/loop-40.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-40.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-40.c (revision 0) @@ -0,0 +1,38 @@ +/* { dg-do compile { target { lp64 } } } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ + +typedef long unsigned int size_t; +extern float a[100], b[100]; + +int foo (int M, unsigned int l) +{ + unsigned ivtmp = 0, niters, _37, _38, bnd; + size_t _67, _1; + float *vectp_a, *vectp_b, *vectp_a2; + float vect__6, vect__7, vect__8; + + _38 = (unsigned int)l; + bnd = _38 + 1; + + _1 = (size_t) M; + _67 = _1 * 4; + vectp_a = a; vectp_b = b; vectp_a2 = a + _67; + + do + { + vect__6 = *vectp_a; + vect__7 = *vectp_b; + vect__8 = vect__6 + vect__7; + *vectp_a = vect__8; + vectp_a = vectp_a + 4; + vectp_b = vectp_b + 4; + vectp_a2 = vectp_a2 + 4; + ivtmp += 1; + } + while (ivtmp < bnd); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(bnd.*\\)" 1 "ivopts" } } */ +/* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 211966) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4432,6 +4432,44 @@ iv_period (struct iv *iv) return period; } +/* Returns no wrapping period of induction variable IV. For now + only unsigned type IV is handled, we could extend it in case + of non-overflow for signed ones. Return zero if it can't be + decided. */ + +static tree +iv_nowrap_period (struct iv *iv) +{ + bool overflow; + tree type; + tree base = iv->base, step = iv->step; + widest_int base_val, step_val, max_val, span, period; + + gcc_assert (step && TREE_CODE (step) == INTEGER_CST); + + type = TREE_TYPE (base); + if (!TYPE_UNSIGNED (type) || TREE_CODE (base) != INTEGER_CST) + return integer_zero_node; + + base_val = wi::to_widest (base); + step_val = wi::to_widest (step); + if (!POINTER_TYPE_P (type) && TYPE_MAX_VALUE (type) + && TREE_CODE (TYPE_MAX_VALUE (type)) == INTEGER_CST) + max_val = wi::to_widest (TYPE_MAX_VALUE (type)); + else + { + wide_int max_wi = wi::max_value (TYPE_PRECISION (type), UNSIGNED); + max_val = wi::to_widest (wide_int_to_tree (type, max_wi)); + } + + span = max_val - base_val + step_val - 1; + period = wi::div_trunc (span, step_val, UNSIGNED, &overflow); + if (overflow) + return integer_zero_node; + + return wide_int_to_tree (type, period); +} + /* Returns the comparison operator used when eliminating the iv USE. */ static enum tree_code @@ -4560,7 +4598,84 @@ difference_cannot_overflow_p (tree base, tree offs } } -/* Tries to replace loop exit by one formulated in terms of a LT_EXPR +/* Check whether PERIOD of CAND is greater than the number of iterations + described by DESC for which the exit condition is true. The exit + condition is comparison against USE. */ + +static bool +period_greater_niter_exit (struct ivopts_data *data, + struct iv_use *use, struct iv_cand *cand, + tree period, struct tree_niter_desc *desc) +{ + struct loop *loop = data->current_loop; + + /* If the number of iterations is constant, compare against it directly. */ + if (TREE_CODE (desc->niter) == INTEGER_CST) + { + /* See cand_value_at. */ + if (stmt_after_increment (loop, cand, use->stmt)) + { + if (!tree_int_cst_lt (desc->niter, period)) + return false; + } + else + { + if (tree_int_cst_lt (period, desc->niter)) + return false; + } + } + + /* If not, and if this is the only possible exit of the loop, see whether + we can get a conservative estimate on the number of iterations of the + entire loop and compare against that instead. */ + else + { + widest_int period_value, max_niter; + + max_niter = desc->max; + if (stmt_after_increment (loop, cand, use->stmt)) + max_niter += 1; + period_value = wi::to_widest (period); + if (wi::gtu_p (max_niter, period_value)) + { + /* See if we can take advantage of inferred loop bound information. */ + if (data->loop_single_exit_p) + { + if (!max_loop_iterations (loop, &max_niter)) + return false; + /* The loop bound is already adjusted by adding 1. */ + if (wi::gtu_p (max_niter, period_value)) + return false; + } + else + return false; + } + } + + return true; +} + +/* Determine whether no wrapping period of CAND is greater than + the number of iterations described by DESC for which exit + conditionis true. The exit condition is comparison against USE. */ + +static bool +nowrap_cand_for_loop_niter_p (struct ivopts_data *data, + struct iv_use *use, + struct iv_cand *cand, + struct tree_niter_desc *desc) +{ + tree period; + + period = iv_nowrap_period (cand->iv); + if (period != integer_zero_node + && period_greater_niter_exit (data, use, cand, period, desc)) + return true; + + return false; +} + +/* Tries to replace loop exit in USE by one formulated in terms of a LT_EXPR comparison with CAND. NITER describes the number of iterations of the loops. If successful, the comparison in COMP_P is altered accordingly. @@ -4597,11 +4712,17 @@ difference_cannot_overflow_p (tree base, tree offs 1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no overflow in computing it or the values of p. 2) if a + 1 > b, then we need to verify that the expression p_0 - a does not - overflow. To prove this, we use the fact that p_0 = base + a. */ + overflow. To prove this, we use the fact that p_0 = base + a. + + Moreover, with more complex overflow analysis for unsigned type, it is + possible to eliminate I with P if P is an induction candidate of unsigned + integer type other than pointer if we can prove that P doesn't overflow + during all iterations of current loop. */ + static bool -iv_elimination_compare_lt (struct ivopts_data *data, - struct iv_cand *cand, enum tree_code *comp_p, +iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use, + struct iv_cand *cand, enum tree_code *comp_p, struct tree_niter_desc *niter) { tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset; @@ -4610,11 +4731,13 @@ static bool HOST_WIDE_INT step; /* We need to know that the candidate induction variable does not overflow. - While more complex analysis may be used to prove this, for now just - check that the variable appears in the original program and that it - is computed in a type that guarantees no overflows. */ + Apart from checking that the variable appears in the original program + and that is is computed in a type that guarantees no overflows, we also + check no wrapping periord of the variable covers the niter if it has + unsigned type. More complex analysis may be used to prove this. */ cand_type = TREE_TYPE (cand->iv->base); - if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type)) + if ((cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type)) + && !nowrap_cand_for_loop_niter_p (data, use, cand, niter)) return false; /* Make sure that the loop iterates till the loop bound is hit, as otherwise @@ -4657,11 +4780,54 @@ static bool else return false; } + else if (TREE_CODE (mbz) == EQ_EXPR) + { + /* Check whether may_be_zero is in below two forms: + 1) b' == TYPE_MAX_VALUE (TREE_TYPE (b')) + where b' is of type nit_type. The expression could be + folded from "b' + 1 < 1". In this case, A is "0" and B + is "b' + 1". + 2) b' == -1 + where b' is of signed type which has nit_type as its + unsigned counterpart. The expression could be folded + from "(nit_type)b' + 1 < 1". In this case, A is "0" + and "(nit_type)b' + 1". */ + tree type; + tree op0 = TREE_OPERAND (mbz, 0); + tree op1 = TREE_OPERAND (mbz, 1); + + if (TREE_CODE (op1) != INTEGER_CST) + { + tree tmp = op0; + op0 = op1; + op1 = tmp; + } + type = TREE_TYPE (op0); + if (TREE_CODE (op1) != INTEGER_CST) + return false; + + if (!TYPE_UNSIGNED (type) + && types_compatible_p (unsigned_type_for (type), nit_type)) + { + type = nit_type; + op0 = fold_convert_loc (UNKNOWN_LOCATION, nit_type, op0); + op1 = fold_convert_loc (UNKNOWN_LOCATION, nit_type, op1); + } + + if (!TYPE_UNSIGNED (type) + || !TYPE_MAX_VALUE (type) + || TREE_CODE (op1) != INTEGER_CST + || !operand_equal_p (op1, TYPE_MAX_VALUE (type), 0)) + return false; + + a = integer_zero_node; + b = fold_build2 (PLUS_EXPR, type, op0, integer_one_node); + } else return false; /* Expected number of iterations is B - A - 1. Check that it matches - the actual number, i.e., that B - A - NITER = 1. */ + the actual number, i.e., that B - A = NITER + 1. */ tree_to_aff_combination (niter->niter, nit_type, &nit); tree_to_aff_combination (fold_convert (nit_type, a), nit_type, &tmpa); tree_to_aff_combination (fold_convert (nit_type, b), nit_type, &tmpb); @@ -4673,12 +4839,15 @@ static bool return false; /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not - overflow. */ - offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step), - cand->iv->step, - fold_convert (TREE_TYPE (cand->iv->step), a)); - if (!difference_cannot_overflow_p (cand->iv->base, offset)) - return false; + overflow. The check is uncessary when A equals to ZERO. */ + if (a != integer_zero_node) + { + offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step), + cand->iv->step, + fold_convert (TREE_TYPE (cand->iv->step), a)); + if (!difference_cannot_overflow_p (cand->iv->base, offset)) + return false; + } /* Determine the new comparison operator. */ comp = step < 0 ? GT_EXPR : LT_EXPR; @@ -4733,50 +4902,9 @@ may_eliminate_iv (struct ivopts_data *data, This is the case iff the period of the induction variable is greater than the number of iterations for which the exit condition is true. */ period = iv_period (cand->iv); + if (!period_greater_niter_exit (data, use, cand, period, desc)) + return false; - /* If the number of iterations is constant, compare against it directly. */ - if (TREE_CODE (desc->niter) == INTEGER_CST) - { - /* See cand_value_at. */ - if (stmt_after_increment (loop, cand, use->stmt)) - { - if (!tree_int_cst_lt (desc->niter, period)) - return false; - } - else - { - if (tree_int_cst_lt (period, desc->niter)) - return false; - } - } - - /* If not, and if this is the only possible exit of the loop, see whether - we can get a conservative estimate on the number of iterations of the - entire loop and compare against that instead. */ - else - { - widest_int period_value, max_niter; - - max_niter = desc->max; - if (stmt_after_increment (loop, cand, use->stmt)) - max_niter += 1; - period_value = wi::to_widest (period); - if (wi::gtu_p (max_niter, period_value)) - { - /* See if we can take advantage of inferred loop bound information. */ - if (data->loop_single_exit_p) - { - if (!max_loop_iterations (loop, &max_niter)) - return false; - /* The loop bound is already adjusted by adding 1. */ - if (wi::gtu_p (max_niter, period_value)) - return false; - } - else - return false; - } - } - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); *bound = fold_convert (TREE_TYPE (cand->iv->base), @@ -4796,7 +4924,7 @@ may_eliminate_iv (struct ivopts_data *data, base the exit condition on it. However, that is often too expensive. */ if (!integer_zerop (desc->may_be_zero)) - return iv_elimination_compare_lt (data, cand, comp, desc); + return iv_elimination_compare_lt (data, use, cand, comp, desc); return true; }