On Fri, 18 Dec 2020, Richard Sandiford wrote:

> On AArch64, the vectoriser tries various ways of vectorising with both
> SVE and Advanced SIMD and picks the best one.  All other things being
> equal, it prefers earlier attempts over later attempts.
> 
> The way this works currently is that, once it has a successful
> vectorisation attempt A, it analyses all other attempts as epilogue
> loops of A:
> 
>       /* When pick_lowest_cost_p is true, we should in principle iterate
>        over all the loop_vec_infos that LOOP_VINFO could replace and
>        try to vectorize LOOP_VINFO under the same conditions.
>        E.g. when trying to replace an epilogue loop, we should vectorize
>        LOOP_VINFO as an epilogue loop with the same VF limit.  When trying
>        to replace the main loop, we should vectorize LOOP_VINFO as a main
>        loop too.
> 
>        However, autovectorize_vector_modes is usually sorted as follows:
> 
>        - Modes that naturally produce lower VFs usually follow modes that
>          naturally produce higher VFs.
> 
>        - When modes naturally produce the same VF, maskable modes
>          usually follow unmaskable ones, so that the maskable mode
>          can be used to vectorize the epilogue of the unmaskable mode.
> 
>        This order is preferred because it leads to the maximum
>        epilogue vectorization opportunities.  Targets should only use
>        a different order if they want to make wide modes available while
>        disparaging them relative to earlier, smaller modes.  The assumption
>        in that case is that the wider modes are more expensive in some
>        way that isn't reflected directly in the costs.
> 
>        There should therefore be few interesting cases in which
>        LOOP_VINFO fails when treated as an epilogue loop, succeeds when
>        treated as a standalone loop, and ends up being genuinely cheaper
>        than FIRST_LOOP_VINFO.  */
> 
> However, the vectoriser can normally elide alias checks for epilogue
> loops, on the basis that the main loop should do them instead.
> Converting an epilogue loop to a main loop can therefore cause the alias
> checks to be skipped.  (It probably also unfairly penalises the original
> loop in the cost comparison, given that one loop will have alias checks
> and the other won't.)
> 
> As the comment says, we should in principle analyse each vector mode
> twice: once as a main loop and once as an epilogue.  However, doing
> that up-front would be quite expensive.  This patch instead goes for a
> compromise: if an epilogue loop for mode M2 seems better than a main
> loop for mode M1, re-analyse with M2 as the main loop.
> 
> The patch fixes dg.torture.exp=pr69719.c when testing with
> -msve-vector-bits=128.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
>       PR tree-optimization/98371
>       * tree-vect-loop.c (vect_reanalyze_as_main_loop): New function.
>       (vect_analyze_loop): If an epilogue loop appears to be cheaper
>       than the main loop, re-analyze it as a main loop before adopting
>       it as a main loop.
> ---
>  gcc/tree-vect-loop.c | 61 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 688538a4521..221541f4a38 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2850,6 +2850,45 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
>    return true;
>  }
>  
> +/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
> +   try to reanalyze it as a main loop.  Return the loop_vinfo on success
> +   and null on failure.  */
> +
> +static loop_vec_info
> +vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
> +{
> +  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> +    return loop_vinfo;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +                  "***** Reanalyzing as a main loop with vector mode %s\n",
> +                  GET_MODE_NAME (loop_vinfo->vector_mode));
> +
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  vec_info_shared *shared = loop_vinfo->shared;
> +  opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared);
> +  gcc_assert (main_loop_vinfo);
> +
> +  main_loop_vinfo->vector_mode = loop_vinfo->vector_mode;
> +
> +  bool fatal = false;
> +  bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
> +  loop->aux = NULL;
> +  if (!res)
> +    {
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_NOTE, vect_location,
> +                      "***** Failed to analyze main loop with vector"
> +                      " mode %s\n",
> +                      GET_MODE_NAME (loop_vinfo->vector_mode));
> +      delete main_loop_vinfo;
> +      return NULL;
> +    }
> +  LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1;
> +  return main_loop_vinfo;
> +}
> +
>  /* Function vect_analyze_loop.
>  
>     Apply a set of analyses on LOOP, and create a loop_vec_info struct
> @@ -2997,9 +3036,25 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>             if (vinfos.is_empty ()
>                 && vect_joust_loop_vinfos (loop_vinfo, first_loop_vinfo))
>               {
> -               delete first_loop_vinfo;
> -               first_loop_vinfo = opt_loop_vec_info::success (NULL);
> -               LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = NULL;
> +               loop_vec_info main_loop_vinfo
> +                 = vect_reanalyze_as_main_loop (loop_vinfo, &n_stmts);
> +               if (main_loop_vinfo == loop_vinfo)
> +                 {
> +                   delete first_loop_vinfo;
> +                   first_loop_vinfo = opt_loop_vec_info::success (NULL);
> +                 }
> +               else if (main_loop_vinfo
> +                        && vect_joust_loop_vinfos (main_loop_vinfo,
> +                                                   first_loop_vinfo))
> +                 {
> +                   delete first_loop_vinfo;
> +                   first_loop_vinfo = opt_loop_vec_info::success (NULL);
> +                   delete loop_vinfo;
> +                   loop_vinfo
> +                     = opt_loop_vec_info::success (main_loop_vinfo);
> +                 }
> +               else
> +                 delete main_loop_vinfo;
>               }
>           }
>  
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to