On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: > Hi Richi, > > I think this is what you meant, I now hide all the unrolling cost calculations > in the existing target hooks for costs. I did need to adjust 'finish_cost' to > take the loop_vinfo so the target's implementations are able to set the newly > renamed 'suggested_unroll_factor'. > > Also added the checks for the epilogue's VF. > > Is this more like what you had in mind?
Not exactly (sorry..). For the target hook I think we don't want to pass vec_info but instead another output parameter like the existing ones. vect_estimate_min_profitable_iters should then via vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll suggestion to vect_analyze_loop which should then, if the suggestion was > 1, instead of iterating to the next vector mode run again with a fixed VF (old VF times suggested unroll factor - there's min_vf in vect_analyze_loop_2 which we should adjust to the old VF times two for example and maybe store the suggested factor as hint) - if it succeeds the result will end up in the list of considered modes (where we now may have more than one entry for the same mode but a different VF), we probably want to only consider more unrolling once. For simplicity I'd probably set min_vf = max_vf = old VF * suggested factor, thus take the targets request literally. Richard. > > gcc/ChangeLog: > > * config/aarch64/aarch64.c (aarch64_finish_cost): Add class vec_info > parameter. > * config/i386/i386.c (ix86_finish_cost): Likewise. > * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise. > * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST. > * target.def: Add class vec_info parameter to finish_cost. > * targhooks.c (default_finish_cost): Likewise. > * targhooks.h (default_finish_cost): Likewise. > * tree-vect-loop.c (vect_determine_vectorization_factor): Use > suggested_unroll_factor > to increase vectorization_factor if possible. > (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor > member. > (vect_compute_single_scalar_iteration_cost): Adjust call to > finish_cost. > (vect_determine_partial_vectors_and_peeling): Ensure unrolled loop is > not predicated. > (vect_determine_unroll_factor): New. > (vect_try_unrolling): New. > (vect_reanalyze_as_main_loop): Also try to unroll when > reanalyzing as main loop. > (vect_analyze_loop): Add call to vect_try_unrolling and check to > ensure epilogue > is either a smaller VF than main loop or uses partial vectors and > might be of equal > VF. > (vect_estimate_min_profitable_iters): Adjust call to finish_cost. > (vectorizable_reduction): Make sure to not use > single_defuse_cyle when unrolling. > * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to > finish_cost. > * tree-vectorizer.h (finish_cost): Change to pass new class vec_info > parameter. > > On 01/10/2021 09:19, Richard Biener wrote: > > On Thu, 30 Sep 2021, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> > >>>> That just forces trying the vector modes we've tried before. Though I > >>>> might > >>>> need to revisit this now I think about it. I'm afraid it might be > >>>> possible > >>>> for > >>>> this to generate an epilogue with a vf that is not lower than that of the > >>>> main > >>>> loop, but I'd need to think about this again. > >>>> > >>>> Either way I don't think this changes the vector modes used for the > >>>> epilogue. > >>>> But maybe I'm just missing your point here. > >>> Yes, I was refering to the above which suggests that when we vectorize > >>> the main loop with V4SF but unroll then we try vectorizing the > >>> epilogue with V4SF as well (but not unrolled). I think that's > >>> premature (not sure if you try V8SF if the main loop was V4SF but > >>> unrolled 4 times). > >> My main motivation for this was because I had a SVE loop that vectorized > >> with > >> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll > >> V8HI by two and skipped using VNx8HI as a predicated epilogue which > >> would've > >> been the best choice. > > I see, yes - for fully predicated epilogues it makes sense to consider > > the same vector mode as for the main loop anyways (independent on > > whether we're unrolling or not). One could argue that with an > > unrolled V4SImode main loop a predicated V8SImode epilogue would also > > be a good match (but then somehow costing favored the unrolled V4SI > > over the V8SI for the main loop...). > > > >> So that is why I decided to just 'reset' the vector_mode selection. In a > >> scenario where you only have the traditional vector modes it might make > >> less > >> sense. > >> > >> Just realized I still didn't add any check to make sure the epilogue has a > >> lower VF than the previous loop, though I'm still not sure that could > >> happen. > >> I'll go look at where to add that if you agree with this. > > As said above, it only needs a lower VF in case the epilogue is not > > fully masked - otherwise the same VF would be OK. > > > >>>> I can move it there, it would indeed remove the need for the change to > >>>> vect_update_vf_for_slp, the change to > >>>> vect_determine_partial_vectors_and_peeling would still be required I > >>>> think. > >>>> It > >>>> is meant to disable using partial vectors in an unrolled loop. > >>> Why would we disable the use of partial vectors in an unrolled loop? > >> The motivation behind that is that the overhead caused by generating > >> predicates for each iteration will likely be too much for it to be > >> profitable > >> to unroll. On top of that, when dealing with low iteration count loops, if > >> executing one predicated iteration would be enough we now still need to > >> execute all other unrolled predicated iterations, whereas if we keep them > >> unrolled we skip the unrolled loops. > > OK, I guess we're not factoring in costs when deciding on predication > > but go for it if it's gernally enabled and possible. > > > > With the proposed scheme we'd then cost the predicated not unrolled > > loop against a not predicated unrolled loop which might be a bit > > apples vs. oranges also because the target made the unroll decision > > based on the data it collected for the predicated loop. > > > >>> Sure but I'm suggesting you keep the not unrolled body as one way of > >>> costed vectorization but then if the target says "try unrolling" > >>> re-do the analysis with the same mode but a larger VF. Just like > >>> we iterate over vector modes you'll now iterate over pairs of > >>> vector mode + VF (unroll factor). It's not about re-using the costing > >>> it's about using costing that is actually relevant and also to avoid > >>> targets inventing two distinct separate costings - a target (powerpc) > >>> might already compute load/store density and other stuff for the main > >>> costing so it should have an idea whether doubling or triplicating is OK. > >>> > >>> Richard. > >> Sounds good! I changed the patch to determine the unrolling factor later, > >> after all analysis has been done and retry analysis if an unrolling factor > >> larger than 1 has been chosen for this loop and vector_mode. > >> > >> gcc/ChangeLog: > >> > >> * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR. > >> * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR. > >> * params.opt: Add vect-unroll and vect-unroll-reductions > >> parameters. > > What's the reason to add the --params? It looks like this makes > > us unroll with a static number short-cutting the target. > > > > IMHO that's never going to be a great thing - but what we could do > > is look at loop->unroll and try to honor that (factoring in that > > the vectorization factor is already the times we unroll). > > > > So I'd leave those params out for now, the user would have a much > > more fine-grained way to control this with the unroll pragma. > > > > Adding a max-vect-unroll parameter would be another thing but that > > would apply after the targets or pragma decision. > > > >> * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR. > > I still do not like the new target hook - as said I'd like to > > make you have the finis_cost hook allow the target to specify > > a suggested unroll factor instead because that's the point where > > it has all the info. > > > > Thanks, > > Richard. > > > >> * targhooks.c (default_unroll_factor): New. > >> * targhooks.h (default_unroll_factor): Likewise. > >> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize > >> par_unrolling_factor. > >> (vect_determine_partial_vectors_and_peeling): Account for > >> unrolling. > >> (vect_determine_unroll_factor): New. > >> (vect_try_unrolling): New. > >> (vect_reanalyze_as_main_loop): Call vect_try_unrolling when > >> retrying a loop_vinfo as a main loop. > >> (vect_analyze_loop): Call vect_try_unrolling when vectorizing > >> main loops. > >> (vect_analyze_loop): Allow for epilogue vectorization when > >> unrolling > >> and rewalk vector_mode warray for the epilogues. > >> (vectorizable_reduction): Disable single_defuse_cycle when > >> unrolling. > >> * tree-vectorizer.h (vect_unroll_value): Declare > >> par_unrolling_factor > >> as a member of loop_vec_info. > >> > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)