On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk?
Hi, I can't approve patches, but had some comments after going through the implementation. One confusing part is cost model change, as well as the way it's used to decide how epilogue loop should be vectorized. Given vect-tail is disabled at the moment and the cost change needs further tuning, is it reasonable to split this part out and get vectorization part reviewed/submitted independently? For example, let user specified parameters make the decision for now. Cost and target dependent changes should go in at last, this could make the patch easier to read. The implementation computes/shares quite amount information from main loop to epilogue loop vectorization. Furthermore, variables/fields for such information are somehow named in a misleading way. For example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the flag controlling whether epilogue loop should be vectorized with masking. However, it's actually controlled by exactly the same flag as whether epilogue loop should be combined into the main loop with masking: @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) slpeel_make_loop_iterate_ntimes (loop, niters_vector); + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) + vect_combine_loop_epilogue (loop_vinfo); + /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); IMHO, we should decouple main loop vectorization and epilogue vectorization as much as possible by sharing as few information as we can. The general idea is to handle epilogue loop just like normal short-trip loop. For example, we can rename LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something else), and we don't differentiate its meaning between main and epilogue(short-trip) loop. It only indicates the current loop should be vectorized with masking no matter it's a main loop or epilogue loop, and it works just like the current implementation. After this change, we can refine vectorization and make it more general for normal loop and epilogue(short trip) loop. For example, this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue loop and use it to control how it should be vectorized: + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) + { + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; + } + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && min_profitable_combine_iters >= 0) + { This works, but not that good for understanding or maintaining. Thanks, bin