https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87288
--- Comment #6 from bin cheng <amker at gcc dot gnu.org> --- (In reply to Richard Biener from comment #5) > it's set here: > > if (!is_gimple_val (niters_vector)) > { > var = create_tmp_var (type, "bnd"); > gimple_seq stmts = NULL; > niters_vector = force_gimple_operand (niters_vector, &stmts, true, > var); > gsi_insert_seq_on_edge_immediate (pe, stmts); > /* Peeling algorithm guarantees that vector loop bound is at least ONE, > we set range information to make niters analyzer's life easier. */ > if (stmts != NULL && log_vf) > set_range_info (niters_vector, VR_RANGE, > wi::to_wide (build_int_cst (type, 1)), > wi::to_wide (fold_build2 (RSHIFT_EXPR, type, > TYPE_MAX_VALUE (type), > log_vf))); > > and the loop is > > <bb 5> [local count: 105119325]: > niters.0_25 = (unsigned int) n_15; > ni_gap.1_36 = niters.0_25 + 4294967295; > # RANGE [1, 2147483647] NONZERO 2147483647 > bnd.2_37 = ni_gap.1_36 >> 1; > > <bb 4> [local count: 567644349]: > # ivtmp_50 = PHI <ivtmp_51(6), 0(5)> > ivtmp_51 = ivtmp_50 + 1; > if (ivtmp_51 >= bnd.2_37) > goto <bb 12>; [16.67%] > else > goto <bb 6>; [83.33%] > > <bb 6> [local count: 473036958]: > goto <bb 4>; [100.00%] > > which looks good according to the comment. So the number of iterations > _is_ bnd.2_37 - 1 (that number may be zero). Not really. The code was added under assumption that vector code is guarded by condition in peeling. Dump for vect is like: <bb 2> [local count: 161061274]: # PT = { D.2425 } (escaped, escaped heap) # USE = nonlocal null { D.2425 } (escaped, escaped heap) # CLB = nonlocal null { D.2425 } (escaped, escaped heap) _10 = operator new [] (32); _10->_M_elems0 = 0.0; MEM[(struct array2 *)_10 + 16B]._M_elems0 = 0.0; # RANGE [-2147483648, 2147483647] NONZERO 4294967294 n_14 = argc_13(D) * 2; if (n_14 <= 0) goto <bb 3>; [15.00%] else goto <bb 5>; [85.00%] <bb 3> [local count: 161061274]: # USE = nonlocal null { D.2425 } (escaped, escaped heap) # CLB = nonlocal null { D.2425 } (escaped, escaped heap) operator delete [] (_10); jacobianTransposeds ={v} {CLOBBER}; return 0; <bb 5> [local count: 136902083]: niters.0_25 = (unsigned int) n_14; ni_gap.1_36 = niters.0_25 + 4294967295; # RANGE [1, 2147483647] NONZERO 2147483647 bnd.2_37 = ni_gap.1_36 >> 1; <bb 4> [local count: 739271244]: # RANGE [0, 2147483647] NONZERO 2147483647 # i_21 = PHI <i_16(6), 0(5)> # PT = null { D.2425 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 # vectp.5_40 = PHI <vectp.5_41(6), _10(5)> # PT = { D.2388 } # ALIGN = 16, MISALIGN = 0 # vectp_jacobianTransposeds.9_47 = PHI <vectp_jacobianTransposeds.9_48(6), &jacobianTransposeds(5)> # ivtmp_50 = PHI <ivtmp_51(6), 0(5)> # RANGE [0, 2147483646] NONZERO 2147483647 _1 = (long unsigned int) i_21; # RANGE [0, 34359738336] NONZERO 34359738352 _2 = _1 * 16; # PT = null { D.2425 } (escaped, escaped heap) _3 = _10 + _2; _6 = _1 * 8; # PT = { D.2388 } # ALIGN = 8, MISALIGN = 0 _4 = &jacobianTransposeds + _6; vect__5.7_42 = MEM[(double *)vectp.5_40]; # PT = null { D.2425 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 vectp.5_43 = vectp.5_40 + 16; vect__5.8_44 = MEM[(double *)vectp.5_43]; vect_perm_even_45 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 0, 2 }>; vect_perm_odd_46 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 1, 3 }>; _5 = _3->_M_elems0; MEM[(double &)vectp_jacobianTransposeds.9_47] = vect_perm_even_45; # RANGE [1, 2147483647] NONZERO 2147483647 i_16 = i_21 + 1; # PT = null { D.2425 } (escaped, escaped heap) vectp.5_41 = vectp.5_43 + 16; # PT = { D.2388 } # ALIGN = 16, MISALIGN = 0 vectp_jacobianTransposeds.9_48 = vectp_jacobianTransposeds.9_47 + 16; ivtmp_51 = ivtmp_50 + 1; if (ivtmp_51 >= bnd.2_37) goto <bb 12>; [16.67%] else goto <bb 6>; [83.33%] <bb 9> [local count: 136902081]: goto <bb 3>; [100.00%] <bb 6> [local count: 616059372]: goto <bb 4>; [100.00%] <bb 12> [local count: 136902083]: niters_vector_mult_vf.3_38 = bnd.2_37 << 1; tmp.4_39 = (int) niters_vector_mult_vf.3_38; <bb 13> [local count: 912680552]: # RANGE [0, 2147483647] NONZERO 2147483647 # i_24 = PHI <i_29(14), tmp.4_39(12)> # RANGE [0, 2147483646] NONZERO 2147483647 _19 = (long unsigned int) i_24; # RANGE [0, 34359738336] NONZERO 34359738352 _33 = _19 * 16; # PT = null { D.2425 } (escaped, escaped heap) _26 = _10 + _33; _27 = _19 * 8; # PT = { D.2388 } # ALIGN = 8, MISALIGN = 0 _28 = &jacobianTransposeds + _27; _31 = _26->_M_elems0; MEM[(double &)_28] = _31; # RANGE [1, 2147483647] NONZERO 2147483647 i_29 = i_24 + 1; if (n_14 <= i_29) goto <bb 9>; [15.00%] else goto <bb 14>; [85.00%] <bb 14> [local count: 775778470]: goto <bb 13>; [100.00%] This is no peeling guard condition skipping vector loop anymore. In case of "argc_13(D) == 1", the vector loop body is executed exactly once (corresponding 2 times before vectorization); after vector loop, the epilog loop body is executed 2 times again (for the same iteration as done in vector loop). There is two problems here: A) it's at least inefficient when ("argc_13(D) == 1" && !SVE). B) Given the vector loop is not guarded by peeling condition anymore, range info as you noted should not be set for bnd.2_37, because it could take value ZERO. Change in peeling is made by revision 256635, specifically, by below code changes: + poly_uint64 bound_epilog = 0; + if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) + bound_epilog += vf - 1; + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) + bound_epilog += 1; //...... @@ -2577,10 +2593,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (skip_vector) { /* Additional epilogue iteration is peeled if gap exists. */ - bool peel_for_gaps = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); tree t = vect_gen_scalar_loop_niters (niters_prolog, prolog_peeling, - bound_prolog, - peel_for_gaps ? vf : vf - 1, + bound_prolog, bound_epilog, Now bound_epilog == 1 is passed into vect_gen_scalar_loop_niters, rather than vf (== 2), this causes no peeling condition is generated. Either below condition is too strict here or we need to identify and skip setting range info in this case: + if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) + bound_epilog += vf - 1; Thanks