"Kewen.Lin" <li...@linux.ibm.com> writes: >> […] >>> I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS >>> part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to >>> fail at execution during vect-partial-vector-usage=2. So far the patch >>> doesn't handle any niters_skip cases. I think if we want to support it, >>> we have to add some handlings in/like what we have for masking, such as: >>> mask_skip_niters, vect_prepare_for_masked_peels etc. >>> >>> Do you prefer me to extend the support in this patch series? >> >> It's not so much whether it has to be supported now, but more why >> it doesn't work now. What was the reason for the failure? >> >> The peeling-with-masking thing is just an optimisation, so that we >> can vectorise the peeled iterations rather than falling back to >> scalar code for them. It shouldn't be needed for correctness. >> > > Whoops, thanks for the clarification! Nice, I just realized it's a way to > adopt partial vectors for prologue. The fail case is > gcc.dg/vect/vect-ifcvt-11.c. > There the first iteration is optimized out due to the known AND result of > IV 0, then it tries to peel 3 iterations, the number of remaining iterations > for vectorization body is expected to be 12. But it still uses 15 and causes > out-of-bound access. > > The below fix can fix the failure. The justification is that we need to use > the fixed up niters after peeling prolog for the vectorization body for > partial vectors. I'm not sure why the other cases not using partial vectors > don't need the fixed up niters, to avoid troubles I guarded it with > LOOP_VINFO_USING_PARTIAL_VECTORS_P explicitly.
I think the reason is that if we're peeling prologue iterations and the total number of iterations isn't fixed, full-vector vectorisation will “almost always” need an epilogue loop too, and in that case niters_vector will be nonnull. But that's not guaranteed to be true forever. E.g. if the start pointers have a known misalignment that require peeling a constant number of iterations N, and if we can prove (using enhanced range/ nonzero-bits information) that the way niters is calculated means that niter - N is a multiple of the vector size, we could peel the prologue and not the epilogue. In that case, what your patch does would be correct. So… > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -8888,6 +8896,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple > *loop_vectorized_call) > LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf); > step_vector = build_one_cst (TREE_TYPE (niters)); > } > + else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > + && !vect_use_loop_mask_for_alignment_p (loop_vinfo)) > + vect_gen_vector_loop_niters (loop_vinfo, LOOP_VINFO_NITERS > (loop_vinfo), > + &niters_vector, &step_vector, > + niters_no_overflow); > else > vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, > &step_vector, niters_no_overflow); …I think we should drop the LOOP_VINFO_USING_PARTIAL_VECTORS_P condition. Could you also add a comment above the new call saying: /* vect_do_peeling subtracted the number of peeled prologue iterations from LOOP_VINFO_NITERS. */ It wasn't obvious to me where the update was happening when I first looked at the code. Very minor, but maybe also switch the last two cases round so that “else” is the default behaviour and the “if”s are the exceptions. OK with those changes, thanks. Richard