Hi Richard, on 2020/7/7 下午6:44, Richard Sandiford wrote: > "Kewen.Lin" <li...@linux.ibm.com> writes: >> on 2020/7/2 下午1:20, Kewen.Lin via Gcc-patches wrote: >>> on 2020/7/1 下午11:17, Richard Sandiford wrote: >>>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>>> on 2020/7/1 上午3:53, Richard Sandiford wrote: >>>>>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>>>>> + /* Decide whether to use fully-masked approach. */ >>>>>>> + if (vect_verify_full_masking (loop_vinfo)) >>>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; >>>>>>> + /* Decide whether to use length-based approach. */ >>>>>>> + else if (vect_verify_loop_lens (loop_vinfo)) >>>>>>> + { >>>>>>> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) >>>>>>> + || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)) >>>>>>> + { >>>>>>> + if (dump_enabled_p ()) >>>>>>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>>>>>> + "can't vectorize this loop with >>>>>>> length-based" >>>>>>> + " partial vectors approach becuase >>>>>>> peeling" >>>>>>> + " for alignment or gaps is >>>>>>> required.\n"); >>>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; >>>>>>> + } >>>>>> >>>>>> Why are these peeling cases necessary? Peeling for gaps should >>>>>> just mean subtracting one scalar iteration from the iteration count >>>>>> and shouldn't otherwise affect the main loop. Similarly, peeling for >>>>>> alignment can be handled in the normal way, with a scalar prologue loop. >>>>>> >>>>> >>>>> I was thinking to relax this later and to avoid to handle too many cases >>>>> in the first enablement patch. Since Power hw whose level is able to >>>>> support >>>>> vector with length, it supports unaligned load/store, need to construct >>>>> some cases for them. May I postpone it a bit? Or you prefer me to >>>>> support >>>>> it here? >>>> >>>> I've no objection to postponing it if there are specific known >>>> problems that make it difficult, but I think we should at least >>>> say what they are. On the face of it, I'm not sure why it doesn't >>>> Just Work, since the way that we control the main loop should be >>>> mostly orthogonal to how we handle peeled prologue iterations >>>> and how we handle a single peeled epilogue iteration. >>>> >>> >>> OK, I will remove it to see the impact. By the way, do you think to >>> use partial vectors for prologue is something worth to trying in future? >>> >> >> I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS >> part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to >> fail at execution during vect-partial-vector-usage=2. So far the patch >> doesn't handle any niters_skip cases. I think if we want to support it, >> we have to add some handlings in/like what we have for masking, such as: >> mask_skip_niters, vect_prepare_for_masked_peels etc. >> >> Do you prefer me to extend the support in this patch series? > > It's not so much whether it has to be supported now, but more why > it doesn't work now. What was the reason for the failure? > > The peeling-with-masking thing is just an optimisation, so that we > can vectorise the peeled iterations rather than falling back to > scalar code for them. It shouldn't be needed for correctness. >
Whoops, thanks for the clarification! Nice, I just realized it's a way to adopt partial vectors for prologue. The fail case is gcc.dg/vect/vect-ifcvt-11.c. There the first iteration is optimized out due to the known AND result of IV 0, then it tries to peel 3 iterations, the number of remaining iterations for vectorization body is expected to be 12. But it still uses 15 and causes out-of-bound access. The below fix can fix the failure. The justification is that we need to use the fixed up niters after peeling prolog for the vectorization body for partial vectors. I'm not sure why the other cases not using partial vectors don't need the fixed up niters, to avoid troubles I guarded it with LOOP_VINFO_USING_PARTIAL_VECTORS_P explicitly. Does it make sense? --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -8888,6 +8896,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf); step_vector = build_one_cst (TREE_TYPE (niters)); } + else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && !vect_use_loop_mask_for_alignment_p (loop_vinfo)) + vect_gen_vector_loop_niters (loop_vinfo, LOOP_VINFO_NITERS (loop_vinfo), + &niters_vector, &step_vector, + niters_no_overflow); else vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, &step_vector, niters_no_overflow); BR, Kewen