Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

Richard Biener via Gcc-patches Fri, 15 Oct 2021 01:49:05 -0700

On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> I think this is what you meant, I now hide all the unrolling cost calculations
> in the existing target hooks for costs. I did need to adjust 'finish_cost' to
> take the loop_vinfo so the target's implementations are able to set the newly
> renamed 'suggested_unroll_factor'.
> 
> Also added the checks for the epilogue's VF.
> 
> Is this more like what you had in mind?


Not exactly (sorry..).  For the target hook I think we don't want to
pass vec_info but instead another output parameter like the existing
ones.

vect_estimate_min_profitable_iters should then via
vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll
suggestion to vect_analyze_loop which should then, if the suggestion
was > 1, instead of iterating to the next vector mode run again
with a fixed VF (old VF times suggested unroll factor - there's
min_vf in vect_analyze_loop_2 which we should adjust to
the old VF times two for example and maybe store the suggested
factor as hint) - if it succeeds the result will end up in the
list of considered modes (where we now may have more than one
entry for the same mode but a different VF), we probably want to
only consider more unrolling once.

For simplicity I'd probably set min_vf = max_vf = old VF * suggested 
factor, thus take the targets request literally.

Richard.

> 
> gcc/ChangeLog:
> 
>         * config/aarch64/aarch64.c (aarch64_finish_cost): Add class vec_info
> parameter.
>         * config/i386/i386.c (ix86_finish_cost): Likewise.
>         * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.
>         * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.
>         * target.def: Add class vec_info parameter to finish_cost.
>         * targhooks.c (default_finish_cost): Likewise.
>         * targhooks.h (default_finish_cost): Likewise.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Use 
> suggested_unroll_factor
>         to increase vectorization_factor if possible.
>         (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor 
> member.
>         (vect_compute_single_scalar_iteration_cost): Adjust call to
> finish_cost.
>         (vect_determine_partial_vectors_and_peeling): Ensure unrolled loop is
> not predicated.
>         (vect_determine_unroll_factor): New.
>         (vect_try_unrolling): New.
>         (vect_reanalyze_as_main_loop): Also try to unroll when 
> reanalyzing as main loop.
>         (vect_analyze_loop): Add call to vect_try_unrolling and check to
> ensure epilogue
>         is either a smaller VF than main loop or uses partial vectors and
> might be of equal
>         VF.
>         (vect_estimate_min_profitable_iters): Adjust call to finish_cost.
>         (vectorizable_reduction): Make sure to not use 
> single_defuse_cyle when unrolling.
>         * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to
> finish_cost.
>         * tree-vectorizer.h (finish_cost): Change to pass new class vec_info
> parameter.
> 
> On 01/10/2021 09:19, Richard Biener wrote:
> > On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >>
> >>>> That just forces trying the vector modes we've tried before. Though I
> >>>> might
> >>>> need to revisit this now I think about it. I'm afraid it might be
> >>>> possible
> >>>> for
> >>>> this to generate an epilogue with a vf that is not lower than that of the
> >>>> main
> >>>> loop, but I'd need to think about this again.
> >>>>
> >>>> Either way I don't think this changes the vector modes used for the
> >>>> epilogue.
> >>>> But maybe I'm just missing your point here.
> >>> Yes, I was refering to the above which suggests that when we vectorize
> >>> the main loop with V4SF but unroll then we try vectorizing the
> >>> epilogue with V4SF as well (but not unrolled).  I think that's
> >>> premature (not sure if you try V8SF if the main loop was V4SF but
> >>> unrolled 4 times).
> >> My main motivation for this was because I had a SVE loop that vectorized
> >> with
> >> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
> >> V8HI by two and skipped using VNx8HI as a predicated epilogue which
> >> would've
> >> been the best choice.
> > I see, yes - for fully predicated epilogues it makes sense to consider
> > the same vector mode as for the main loop anyways (independent on
> > whether we're unrolling or not).  One could argue that with an
> > unrolled V4SImode main loop a predicated V8SImode epilogue would also
> > be a good match (but then somehow costing favored the unrolled V4SI
> > over the V8SI for the main loop...).
> >
> >> So that is why I decided to just 'reset' the vector_mode selection. In a
> >> scenario where you only have the traditional vector modes it might make
> >> less
> >> sense.
> >>
> >> Just realized I still didn't add any check to make sure the epilogue has a
> >> lower VF than the previous loop, though I'm still not sure that could
> >> happen.
> >> I'll go look at where to add that if you agree with this.
> > As said above, it only needs a lower VF in case the epilogue is not
> > fully masked - otherwise the same VF would be OK.
> >
> >>>> I can move it there, it would indeed remove the need for the change to
> >>>> vect_update_vf_for_slp, the change to
> >>>> vect_determine_partial_vectors_and_peeling would still be required I
> >>>> think.
> >>>> It
> >>>> is meant to disable using partial vectors in an unrolled loop.
> >>> Why would we disable the use of partial vectors in an unrolled loop?
> >> The motivation behind that is that the overhead caused by generating
> >> predicates for each iteration will likely be too much for it to be
> >> profitable
> >> to unroll. On top of that, when dealing with low iteration count loops, if
> >> executing one predicated iteration would be enough we now still need to
> >> execute all other unrolled predicated iterations, whereas if we keep them
> >> unrolled we skip the unrolled loops.
> > OK, I guess we're not factoring in costs when deciding on predication
> > but go for it if it's gernally enabled and possible.
> >
> > With the proposed scheme we'd then cost the predicated not unrolled
> > loop against a not predicated unrolled loop which might be a bit
> > apples vs. oranges also because the target made the unroll decision
> > based on the data it collected for the predicated loop.
> >
> >>> Sure but I'm suggesting you keep the not unrolled body as one way of
> >>> costed vectorization but then if the target says "try unrolling"
> >>> re-do the analysis with the same mode but a larger VF.  Just like
> >>> we iterate over vector modes you'll now iterate over pairs of
> >>> vector mode + VF (unroll factor).  It's not about re-using the costing
> >>> it's about using costing that is actually relevant and also to avoid
> >>> targets inventing two distinct separate costings - a target (powerpc)
> >>> might already compute load/store density and other stuff for the main
> >>> costing so it should have an idea whether doubling or triplicating is OK.
> >>>
> >>> Richard.
> >> Sounds good! I changed the patch to determine the unrolling factor later,
> >> after all analysis has been done and retry analysis if an unrolling factor
> >> larger than 1 has been chosen for this loop and vector_mode.
> >>
> >> gcc/ChangeLog:
> >>
> >>          * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR.
> >>          * doc/tm.texi.in: Add entries for TARGET_VECTORIZE_UNROLL_FACTOR.
> >>          * params.opt: Add vect-unroll and vect-unroll-reductions
> >> parameters.
> > What's the reason to add the --params?  It looks like this makes
> > us unroll with a static number short-cutting the target.
> >
> > IMHO that's never going to be a great thing - but what we could do
> > is look at loop->unroll and try to honor that (factoring in that
> > the vectorization factor is already the times we unroll).
> >
> > So I'd leave those params out for now, the user would have a much
> > more fine-grained way to control this with the unroll pragma.
> >
> > Adding a max-vect-unroll parameter would be another thing but that
> > would apply after the targets or pragma decision.
> >
> >>          * target.def: Define hook TARGET_VECTORIZE_UNROLL_FACTOR.
> > I still do not like the new target hook - as said I'd like to
> > make you have the finis_cost hook allow the target to specify
> > a suggested unroll factor instead because that's the point where
> > it has all the info.
> >
> > Thanks,
> > Richard.
> >
> >>          * targhooks.c (default_unroll_factor): New.
> >>          * targhooks.h (default_unroll_factor): Likewise.
> >>          * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> >>          par_unrolling_factor.
> >>          (vect_determine_partial_vectors_and_peeling): Account for
> >> unrolling.
> >>          (vect_determine_unroll_factor): New.
> >>          (vect_try_unrolling): New.
> >>          (vect_reanalyze_as_main_loop): Call vect_try_unrolling when
> >>          retrying a loop_vinfo as a main loop.
> >>          (vect_analyze_loop): Call vect_try_unrolling when vectorizing
> >> main loops.
> >>          (vect_analyze_loop): Allow for epilogue vectorization when
> >>  unrolling
> >>          and rewalk vector_mode warray for the epilogues.
> >>          (vectorizable_reduction): Disable single_defuse_cycle when
> >> unrolling.
> >>          * tree-vectorizer.h (vect_unroll_value): Declare
> >>  par_unrolling_factor
> >>          as a member of loop_vec_info.
> >>
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

Reply via email to