https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79159
--- Comment #4 from amker at gcc dot gnu.org --- Discussed with richi, and conclusion is that vrp issue is hard to fix at the moment. Easy way out is to investigate why cunrolli peels one additional iteration than necessary. Note cunrolli computes unrolling number using niter information, which in turn is inferred from local array bound for this case : "tmpCorr[9][9]". The problem is in function record_nonwrapping_iv, as below code: wide_int min, max; extreme = fold_convert (unsigned_type, high); if (TREE_CODE (orig_base) == SSA_NAME && TREE_CODE (low) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (orig_base)) && get_range_info (orig_base, &min, &max) == VR_RANGE && wi::gts_p (min, low)) base = wide_int_to_tree (unsigned_type, min); else if (TREE_CODE (base) != INTEGER_CST && dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (stmt))) base = fold_convert (unsigned_type, low); delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base); When analyzing "tmpCorr[i_3][j_4]" in below dump: <bb 4> [15.00%]: j_11 = i_3 + 1; <bb 5> [100.00%]: # j_4 = PHI <j_11(4), j_13(6)> if (_1 <= j_4) goto <bb 8>; [15.00%] else goto <bb 6>; [85.00%] <bb 8> [15.00%]: goto <bb 3>; [100.00%] <bb 6> [85.00%]: _2 = tmpCorr[i_3][j_4]; bar = _2; j_13 = j_4 + 1; goto <bb 5>; [100.00%] The SCEV for j_4 in loop_2 is {j_11, 1}_2, aforementioned code fails to check that j_11 is larger than 0, instead it uses "low = 0" as starting iteration, resulting in peeling one additional iteration. There are two possible fixes here. One is to investigate why evrp doesn't compute correct range for j_11: _1: VARYING _3: VARYING i_4: [0, +INF] j_5: [j_13, +INF] n_12(D): ~[0, 0] j_13: VARYING <----inaccurate. j_15: [-2147483647, +INF] The other fix is to do more SCEV analysis on j_11 in outer loop. Continue looking which one is better. Thanks.