https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87288
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 19 Sep 2018, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87288 > > --- Comment #6 from bin cheng <amker at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #5) > > it's set here: > > > > if (!is_gimple_val (niters_vector)) > > { > > var = create_tmp_var (type, "bnd"); > > gimple_seq stmts = NULL; > > niters_vector = force_gimple_operand (niters_vector, &stmts, true, > > var); > > gsi_insert_seq_on_edge_immediate (pe, stmts); > > /* Peeling algorithm guarantees that vector loop bound is at least > > ONE, > > we set range information to make niters analyzer's life easier. */ > > if (stmts != NULL && log_vf) > > set_range_info (niters_vector, VR_RANGE, > > wi::to_wide (build_int_cst (type, 1)), > > wi::to_wide (fold_build2 (RSHIFT_EXPR, type, > > TYPE_MAX_VALUE (type), > > log_vf))); > > > > and the loop is > > > > <bb 5> [local count: 105119325]: > > niters.0_25 = (unsigned int) n_15; > > ni_gap.1_36 = niters.0_25 + 4294967295; > > # RANGE [1, 2147483647] NONZERO 2147483647 > > bnd.2_37 = ni_gap.1_36 >> 1; > > > > <bb 4> [local count: 567644349]: > > # ivtmp_50 = PHI <ivtmp_51(6), 0(5)> > > ivtmp_51 = ivtmp_50 + 1; > > if (ivtmp_51 >= bnd.2_37) > > goto <bb 12>; [16.67%] > > else > > goto <bb 6>; [83.33%] > > > > <bb 6> [local count: 473036958]: > > goto <bb 4>; [100.00%] > > > > which looks good according to the comment. So the number of iterations > > _is_ bnd.2_37 - 1 (that number may be zero). > > Not really. The code was added under assumption that vector code is guarded > by > condition in peeling. > > Dump for vect is like: > <bb 2> [local count: 161061274]: > # PT = { D.2425 } (escaped, escaped heap) > # USE = nonlocal null { D.2425 } (escaped, escaped heap) > # CLB = nonlocal null { D.2425 } (escaped, escaped heap) > _10 = operator new [] (32); > _10->_M_elems0 = 0.0; > MEM[(struct array2 *)_10 + 16B]._M_elems0 = 0.0; > # RANGE [-2147483648, 2147483647] NONZERO 4294967294 > n_14 = argc_13(D) * 2; > if (n_14 <= 0) > goto <bb 3>; [15.00%] > else > goto <bb 5>; [85.00%] > > <bb 3> [local count: 161061274]: > # USE = nonlocal null { D.2425 } (escaped, escaped heap) > # CLB = nonlocal null { D.2425 } (escaped, escaped heap) > operator delete [] (_10); > jacobianTransposeds ={v} {CLOBBER}; > return 0; > > <bb 5> [local count: 136902083]: > niters.0_25 = (unsigned int) n_14; > ni_gap.1_36 = niters.0_25 + 4294967295; > # RANGE [1, 2147483647] NONZERO 2147483647 > bnd.2_37 = ni_gap.1_36 >> 1; > > <bb 4> [local count: 739271244]: > # RANGE [0, 2147483647] NONZERO 2147483647 > # i_21 = PHI <i_16(6), 0(5)> > # PT = null { D.2425 } (escaped, escaped heap) > # ALIGN = 8, MISALIGN = 0 > # vectp.5_40 = PHI <vectp.5_41(6), _10(5)> > # PT = { D.2388 } > # ALIGN = 16, MISALIGN = 0 > # vectp_jacobianTransposeds.9_47 = PHI <vectp_jacobianTransposeds.9_48(6), > &jacobianTransposeds(5)> > # ivtmp_50 = PHI <ivtmp_51(6), 0(5)> > # RANGE [0, 2147483646] NONZERO 2147483647 > _1 = (long unsigned int) i_21; > # RANGE [0, 34359738336] NONZERO 34359738352 > _2 = _1 * 16; > # PT = null { D.2425 } (escaped, escaped heap) > _3 = _10 + _2; > _6 = _1 * 8; > # PT = { D.2388 } > # ALIGN = 8, MISALIGN = 0 > _4 = &jacobianTransposeds + _6; > vect__5.7_42 = MEM[(double *)vectp.5_40]; > # PT = null { D.2425 } (escaped, escaped heap) > # ALIGN = 8, MISALIGN = 0 > vectp.5_43 = vectp.5_40 + 16; > vect__5.8_44 = MEM[(double *)vectp.5_43]; > vect_perm_even_45 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 0, 2 }>; > vect_perm_odd_46 = VEC_PERM_EXPR <vect__5.7_42, vect__5.8_44, { 1, 3 }>; > _5 = _3->_M_elems0; > MEM[(double &)vectp_jacobianTransposeds.9_47] = vect_perm_even_45; > # RANGE [1, 2147483647] NONZERO 2147483647 > i_16 = i_21 + 1; > # PT = null { D.2425 } (escaped, escaped heap) > vectp.5_41 = vectp.5_43 + 16; > # PT = { D.2388 } > # ALIGN = 16, MISALIGN = 0 > vectp_jacobianTransposeds.9_48 = vectp_jacobianTransposeds.9_47 + 16; > ivtmp_51 = ivtmp_50 + 1; > if (ivtmp_51 >= bnd.2_37) > goto <bb 12>; [16.67%] > else > goto <bb 6>; [83.33%] > > <bb 9> [local count: 136902081]: > goto <bb 3>; [100.00%] > > <bb 6> [local count: 616059372]: > goto <bb 4>; [100.00%] > > <bb 12> [local count: 136902083]: > niters_vector_mult_vf.3_38 = bnd.2_37 << 1; > tmp.4_39 = (int) niters_vector_mult_vf.3_38; > > <bb 13> [local count: 912680552]: > # RANGE [0, 2147483647] NONZERO 2147483647 > # i_24 = PHI <i_29(14), tmp.4_39(12)> > # RANGE [0, 2147483646] NONZERO 2147483647 > _19 = (long unsigned int) i_24; > # RANGE [0, 34359738336] NONZERO 34359738352 > _33 = _19 * 16; > # PT = null { D.2425 } (escaped, escaped heap) > _26 = _10 + _33; > _27 = _19 * 8; > # PT = { D.2388 } > # ALIGN = 8, MISALIGN = 0 > _28 = &jacobianTransposeds + _27; > _31 = _26->_M_elems0; > MEM[(double &)_28] = _31; > # RANGE [1, 2147483647] NONZERO 2147483647 > i_29 = i_24 + 1; > if (n_14 <= i_29) > goto <bb 9>; [15.00%] > else > goto <bb 14>; [85.00%] > > <bb 14> [local count: 775778470]: > goto <bb 13>; [100.00%] > > This is no peeling guard condition skipping vector loop anymore. In case of > "argc_13(D) == 1", the vector loop body is executed exactly once > (corresponding > 2 times before vectorization); Hmm, I think it looks all a bit bogus - clearly the vector loop should execute once for argc == 1 but we compute bnd.2_37 == 0 (argc*2 - 1 / 2). That doesn't fit with the <= bnd.2_37 use for the exit condition. For argc == 2 we'd compute bnd.2_37 == 1 and execute the body once as well... > after vector loop, the epilog loop body is > executed 2 times again (for the same iteration as done in vector loop). There > is two problems here: > A) it's at least inefficient when ("argc_13(D) == 1" && !SVE). > B) Given the vector loop is not guarded by peeling condition anymore, range > info as you noted should not be set for bnd.2_37, because it could take value > ZERO. But how does the exit condition make sense there?! If we're peeling for gaps then we should not execute the loop. I guess we optimize the condition immediately because of the bogus range info. > Change in peeling is made by revision 256635, specifically, by below code > changes: > > + poly_uint64 bound_epilog = 0; > + if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > + bound_epilog += vf - 1; > + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) > + bound_epilog += 1; > > //...... > > @@ -2577,10 +2593,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, > tree nitersm1, > if (skip_vector) > { > /* Additional epilogue iteration is peeled if gap exists. */ > - bool peel_for_gaps = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); > tree t = vect_gen_scalar_loop_niters (niters_prolog, prolog_peeling, > - bound_prolog, > - peel_for_gaps ? vf : vf - 1, > + bound_prolog, bound_epilog, > > Now bound_epilog == 1 is passed into vect_gen_scalar_loop_niters, rather than > vf (== 2), this causes no peeling condition is generated. > > Either below condition is too strict here or we need to identify and skip > setting range info in this case: > + if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) > + && LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > + bound_epilog += vf - 1; Thanks for the further analysis. The above allows us to blame Richard.