Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

Richard Biener Tue, 22 Oct 2019 06:56:59 -0700

On Tue, 22 Oct 2019, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> See inline responses to your comments.
> 
> On 11/10/2019 13:57, Richard Biener wrote:
> > On Thu, 10 Oct 2019, Andre Vieira (lists) wrote:
> > 
> >> Hi,
> >>
> 
> > 
> > +
> > +  /* Keep track of vector sizes we know we can vectorize the epilogue
> > with.  */
> > +  vector_sizes epilogue_vsizes;
> >   };
> > 
> > please don't enlarge struct loop, instead track this somewhere
> > in the vectorizer (in loop_vinfo?  I see you already have
> > epilogue_vinfos there - so the loop_vinfo simply lacks
> > convenient access to the vector_size?)  I don't see any
> > use that could be trivially adjusted to look at a loop_vinfo
> > member instead.
> 
> Done.
> > 
> > For the vect_update_inits_of_drs this means that we'd possibly
> > do less CSE.  Not sure if really an issue.
> 
> CSE of what exactly? You are afraid we are repeating a calculation here we
> have done elsewhere before?


All uses of those inits now possibly get the expression instead of
just the SSA name we inserted code for once.  But as said, we'll see.

> > 
> > You use LOOP_VINFO_EPILOGUE_P sometimes and sometimes
> > LOOP_VINFO_ORIG_LOOP_INFO, please change predicates to
> > LOOP_VINFO_EPILOGUE_P.
> 
> I checked and the points where I use LOOP_VINFO_ORIG_LOOP_INFO is because I
> then use the resulting loop info.  If there are cases you feel strongly about
> let me know.

Not too strongly, no.

> > 
> > @@ -2466,15 +2461,62 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> >     else
> >       niters_prolog = build_int_cst (type, 0);
> >       
> > +  loop_vec_info epilogue_vinfo = NULL;
> > +  if (vect_epilogues)
> > +    {
> > ...
> > +       vect_epilogues = false;
> > +    }
> > +
> > 
> > I don't understand what all this does - it clearly needs a comment.
> > Maybe the overall comment of the function should be amended with
> > an overview of how we handle [multiple] epilogue loop vectorization?
> 
> I added more comments both here and on top of the function.  Hopefully it is a
> bit clearer now, but it might need some tweaking.
> 
> > 
> > +
> > +      if (epilogue_any_upper_bound && prolog_peeling >= 0)
> > +       {
> > +         epilog->any_upper_bound = true;
> > +         epilog->nb_iterations_upper_bound = eiters + 1;
> > +       }
> > +
> > 
> > comment missing.  How can prolog_peeling be < 0?  We likely
> > didn't set the upper bound because we don't know it in the
> > case we skipped the vector loop (skip_vector)?  So make sure
> > to not introduce wrong-code issues here - maybe do this
> > optimization as followup?n
> > 
> 
> So the reason for this code wasn't so much an optimization as it was for
> correctness.  But I was mistaken, the failure I was seeing without this code
> was not because of this code, but rather being hidden by it. The problem I was
> seeing was that a prolog was being created using the original loop copy,
> rather than the scalar loop, leading to MASK_LOAD and MASK_STORE being left in
> the scalar prolog, leading to expand ICEs. I have fixed that issue by making
> sure the SCALAR_LOOP is used for prolog peeling and either the loop copy or
> SCALAR loop for epilogue peeling depending on whether we will be vectorizing
> said epilogue.

OK.

> 
> > @@ -1726,7 +1729,13 @@ vect_analyze_loop_costing (loop_vec_info
> > loop_vinfo)
> >         return 0;
> >       }
> > 
> > -  HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop);
> > +  HOST_WIDE_INT estimated_niter = -1;
> > +
> > +  if (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
> > +    estimated_niter
> > +      = vect_vf_for_cost (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) - 1;
> > +  if (estimated_niter == -1)
> > +    estimated_niter = estimated_stmt_executions_int (loop);
> >     if (estimated_niter == -1)
> >       estimated_niter = likely_max_stmt_executions_int (loop);
> >     if (estimated_niter != -1
> > 
> > it's clearer if the old code is completely in a else {} path
> > even though vect_vf_for_cost - 1 should never be -1.
> > 
> Done for the == -1 cases, need to keep the != -1 outside of course.
> > -  if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
> > +  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > +      || ((orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
> > +         && LOOP_REQUIRES_VERSIONING (orig_loop_vinfo)))
> > 
> > not sure why we need to do this for epilouges?
> > 
> 
> This is because we want to compute the versioning threshold for epilogues such
> that we can use the minimum versioning threshold when versioning the main
> loop.  The reason we need to ask we need to ask the original main loop is
> partially because of code in 'vect_analyze_data_ref_dependences' that chooses
> to not do DR dependence analysis and thus never fills
> LOOP_VINFO_MAY_ALIAS_DDRS for the epilogues loop_vinfo and as a consequence
> LOOP_VINFO_COMP_ALIAS_DDRS is always 0.
> 
> The piece of code is preceded by this comment:
>   /* For epilogues we either have no aliases or alias versioning
>      was applied to original loop.  Therefore we may just get max_vf
>      using VF of original loop.  */
> 
> I have added some comments to make it clearer.
> > 
> > +static tree
> > +replace_ops (tree op, hash_map<tree, tree> &mapping)
> > +{
> > 
> > I'm quite sure I've seen such beast elsewhere ;)  simplify_replace_tree
> > comes up first (not a 1:1 match but hints at a possible tree
> > sharing issue in your variant).
> > 
> 
> The reason I couldn't use simplify_replace_tree is because I didn't know what
> the "OLD" value is at the time I want to call it.  Basically I want to check
> whether an SSA name is a key in MAPPING and if so replace it with the
> corresponding VALUE.
> 
> I have changed simplify_replace_tree such that valueize can take a context
> parameter. I replaced one use of replace_ops with it and the other I
> specialized as I found that it was always a MEM_REF and we needed to replace
> the address it was dereferencing.
> 
> > 
> > +  tree advance;
> >     epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1,
> > &niters_vector,
> >                                &step_vector, &niters_vector_mult_vf, th,
> > -                             check_profitability, niters_no_overflow);
> > +                             check_profitability, niters_no_overflow,
> > +                             &advance);
> > +
> > +  if (epilogue)
> > +    {
> > +      basic_block *orig_bbs = get_loop_body (loop);
> > +      loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue);
> > ...
> > 
> > orig_stmts/drs/etc. in the epilogue loop_vinfo and ...
> > 
> > +      /* We are done vectorizing the main loop, so now we update the
> > epilogues
> > +        stmt_vec_info's.  At the same time we set the gimple UID of each
> > +        statement in the epilogue, as these are used to look them up in
> > the
> > +        epilogues loop_vec_info later.  We also keep track of what
> > ...
> > 
> > split this out to a new function.  I wonder why you need to record
> > the DRs, are they not available via ->datarefs and lookup_dr ()?
> 
> lookup_dr may no longer work at this point. I found that for some memory
> accesses by the time I got to this point, the DR_STMT of the data_reference
> pointed to a scalar statement that no longer existed and the lookup_dr to that
> data reference ICE's.  I can't make this update before we transform the loop
> because the data references are shared, so I decided to capture the
> dr_vec_info's instead. Apparently we don't ever do a lookup_dr past this
> point, which I must admit is surprising.

OK, as long as this fixup code is well isolated we can see how to
make it prettier later ;)  But yes, we have some vectorizer transforms
that remove old stmts (bad).  At least that's true for stores, we
could probably delay actual (scalar) stmt removal until the whole
series of loop + epilogue vectorization is finished.

As said, let's try as followup.

> > Still have to go over the main loop doing the analysis/transform.
> > 
> > Thanks, it looks really promising (albeit exepectedly ugly due to
> > the data rewriting).
> > 
> 
> Yeah, though I feel like now that I have put it away into functions it makes
> it look cleaner.  That vect_transform_loop function was getting too big!
> 
> Is this OK for trunk?

You probably no longer need the gentype.c hunk.

+}
+
+static void
+update_epilogue_loop_vinfo (class loop *epilogue, tree advance)

function comment missing

+
+
+  /* We are done vectorizing the main loop, so now we update the 
epilogues

too much vertical space.

+  /* We are done vectorizing the main loop, so now we update the 
epilogues
+     stmt_vec_info's.  At the same time we set the gimple UID of each
+     statement in the epilogue, as these are used to look them up in the
+     epilogues loop_vec_info later.  We also keep track of what
+     stmt_vec_info's have PATTERN_DEF_SEQ's and RELATED_STMT's that might
+     need updating and we construct a mapping between variables defined 
in
+     the main loop and their corresponding names in epilogue.  */
+  for (unsigned i = 0; i < epilogue->num_nodes; ++i)

so for the following code I wonder if you can make use of the
fact that loop copying also copies UIDs, so you should be able
to match stmts via their UIDs and get at the other loop infos
stmt_info by the copy loop stmt UID.

I wonder why you need no modification for the SLP tree?

Otherwise the patch looks OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 2019-10-22  Andre Vieira  <andre.simoesdiasvie...@arm.com>
> 
>     PR 88915
>     * gentype.c (main): Add poly_uint64 type and vector_sizes to
>     generator.
>     * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
>     * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
>     and make the valueize function pointer also take a void pointer.
>     * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
>     around vn_valueize, to call it without a context.
>     (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
>     * tree-vect-loop.c (vect_get_loop_niters): Make externally visible.
>     (_loop_vec_info): Initialize epilogue_vinfos.
>     (~_loop_vec_info): Release epilogue_vinfos.
>     (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>     number of iterations of epilogue.
>     (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>     vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
>     versioning threshold needed for main loop.
>     (vect_analyze_loop): Likewise.
>     (find_in_mapping): New helper function.
>     (update_epilogue_loop_vinfo): New function.
>     (vect_transform_loop): When vectorizing epilogues re-use analysis done
>     on main loop and call update_epilogue_loop_vinfo to update it.
>     * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>     stmts on loop preheader edge.
>     (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>     we decided to vectorize epilogues.  Update epilogues NITERS and
>     construct ADVANCE to update epilogues data references where needed.
>     * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos,
>     epilogue_vsizes and update_epilogue_vinfo members.
>     (LOOP_VINFO_UP_STMTS, LOOP_VINFO_UP_GT_DRS, LOOP_VINFO_UP_DRS,
>     LOOP_VINFO_EPILOGUE_SIZES): Define MACROs.
>     (vect_do_peeling, vect_get_loop_niters, vect_update_inits_of_drs,
>      determine_peel_for_niter, vect_analyze_loop): Add or update declarations.
>     * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>     created loop_vec_info's for epilogues when available.  Otherwise 
> analyse
>     epilogue separately.
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

Reply via email to