On Fri, 25 Oct 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> This is the reworked patch after your comments.
> 
> I have moved the epilogue check into the analysis form disguised under
> '!epilogue_vinfos.is_empty ()'.  This because I realized that I am doing the
> "lowest threshold" check there.
> 
> The only place where we may reject an epilogue_vinfo is when we know the
> number of scalar iterations and we realize the number of iterations left after
> the main loop are not enough to enter the vectorized epilogue so we optimize
> away that code-gen.  The only way we know this to be true is if the number of
> scalar iterations are known and the peeling for alignment is known. So we know
> we will enter the main loop regardless, so whether the threshold we use is for
> a lower VF or not it shouldn't matter as much, I would even like to think that
> check isn't done, but I am not sure... Might be worth checking as an
> optimization.
> 
> 
> Is this OK for trunk?

+      for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]);
+          !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi))
+       {
..
+         if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo))
+           pattern_worklist.safe_push (stmt_vinfo);
+
+         related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+         while (related_vinfo && related_vinfo != stmt_vinfo)
+           {

I think PHIs cannot have patterns.  You can assert
that STMT_VINFO_RELATED_STMT is NULL I think.

+         related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo);
+         while (related_vinfo && related_vinfo != stmt_vinfo)
+           {
+             related_worklist.safe_push (related_vinfo);
+             /* Set BB such that the assert in
+               'get_initial_def_for_reduction' is able to determine that
+               the BB of the related stmt is inside this loop.  */
+             gimple_set_bb (STMT_VINFO_STMT (related_vinfo),
+                            gimple_bb (new_stmt));
+             related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo);
+           }

do we really keep references to "nested" patterns?  Thus, do you
need this loop?

+  /* The PATTERN_DEF_SEQs in the epilogue were constructed using the
+     original main loop and thus need to be updated to refer to the 
cloned
+     variables used in the epilogue.  */
+  for (unsigned i = 0; i < pattern_worklist.length (); ++i)
+    {
...
+                 op = simplify_replace_tree (op, NULL_TREE, NULL_TREE,
+                                        &find_in_mapping, &mapping);
+                 gimple_set_op (seq, j, op);

you do this for the pattern-def seq but not for the related one.
I guess you ran into this for COND_EXPR conditions.  I wondered
to use a shared worklist for both the def-seq and the main pattern
stmt or at least to split out the replacement so you can share it.

+      /* Data references for gather loads and scatter stores do not use 
the
+        updated offset we set using ADVANCE.  Instead we have to make 
sure the
+        reference in the data references point to the corresponding copy 
of
+        the original in the epilogue.  */
+      if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
+       {
+         int j;
+         if (TREE_CODE (DR_REF (dr)) == MEM_REF)
+           j = 0;
+         else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF)
+           j = 1;
+         else
+           gcc_unreachable ();
+
+         if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j)))
+           {
+             DR_REF (dr) = unshare_expr (DR_REF (dr));
+             TREE_OPERAND (DR_REF (dr), j) = *new_op;
+           }

huh, do you really only ever see MEM_REF or ARRAY_REF here?
I would guess using simplify_replace_tree is safer.
There's also DR_BASE_ADDRESS - we seem to leave the DRs partially
updated, is that correct?

Otherwise looks OK to me.

Thanks,
Richard.


> gcc/ChangeLog:
> 2019-10-25  Andre Vieira  <andre.simoesdiasvie...@arm.com>
> 
>     PR 88915
>     * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
>     * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
>     and make the valueize function pointer also take a void pointer.
>     * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
>     around vn_valueize, to call it without a context.
>     (process_bb): Use vn_valueize_wrapper instead of vn_valueize.
>     * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
>     (~_loop_vec_info): Release epilogue_vinfos.
>     (vect_analyze_loop_costing): Use knowledge of main VF to estimate
>     number of iterations of epilogue.
>     (vect_analyze_loop_2): Adapt to analyse main loop for all supported
>     vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
>     versioning threshold needed for main loop.
>     (vect_analyze_loop): Likewise.
>     (find_in_mapping): New helper function.
>     (update_epilogue_loop_vinfo): New function.
>     (vect_transform_loop): When vectorizing epilogues re-use analysis done
>     on main loop and call update_epilogue_loop_vinfo to update it.
>     * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
>     stmts on loop preheader edge.
>     (vect_do_peeling): Enable skip-vectors when doing loop versioning if
>     we decided to vectorize epilogues.  Update epilogues NITERS and
>     construct ADVANCE to update epilogues data references where needed.
>     * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
>     (vect_do_peeling, vect_update_inits_of_drs,
>      determine_peel_for_niter, vect_analyze_loop): Add or update declarations.
>     * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
>     created loop_vec_info's for epilogues when available.  Otherwise 
> analyse
>     epilogue separately.
> 
> 
> 
> Cheers,
> Andre
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to