On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:

> Hi Richard(s),
> 
> I am trying to tackle PR88915 and get GCC to further vectorize the "fallback"
> loop when doing loop versioning as it does when loop versioning is not
> required.
> 
> I have a prototype patch that needs further testing, but at first glance it
> seems to be achieving the desired outcome.
> I was wondering whether you had any specific concerns with my current
> approach.
> 
> On top of this change I am looking at the iterations and alias checks
> generated for every "vectorized-version". I.e. with the above patch I see:
> if (iterations_check_VF_0 () && alias_check_VF_0 ())
>   vectorized_for_VF_0 ();
> else if (iterations_check_VF_1 () && alias_check_VF_1 ())
>   vectorized_for_VF_1 ();
> ...
> else
>   scalar_loop ();
> 
> The alias checks are not always short and may cause unnecessary performance
> hits. Instead I am now trying to change the checks to produce the following
> form:
> 
> if (iterations_check_VF_0 ())
> {
>   if (alias_check_VF_0 ())
>    {
>      vectorized_for_VF_0 ();
>    }
>   else
>     goto VF_1_check;  // or scalar_loop
> }
> else if (iterations_check_VF_1 ())
>   {
> VF_1_check:
> 
>     if (alias_check_VF_1 ())
>       vectorized_for_VF_1 ();
>     else
>       goto goto_VF_2_check; // or scalar_loop
>   }
> ...
> else
>   scalar_loop ();

I think for code-size reason it would make sense to do it like

  if (iterations_check_for_lowest_VF ())
    {
      if (alias_check_for_highest_VF ())
        {
          vectorized_for_highest_VF ();
          vectorized epilogues;
        }
    }

and make the vectorized_for_highest_VF loop skipped, falling through
to the vectorized epilogues, when the number of iterations isn't
enough to hit it.

The advantage is that this would just use the epilogue vectorization
code and it would avoid excessive code growth if you have many
VFs to consider (on x86 we now have 8 byte, 16 byte, 32 byte and
64 byte vectors...).  The disadvantage is of course that a small
number of loops will not enter the vector code at all - namely
those that would pass the alias check for lowest_VF but not the
one for highest_VF.  I'm sure this isn't a common situation and
in quite a number of cases we formulate the alias check in a way
that it isn't dependent on the VF anyways.  There's also possibly
an extra branch for the case the highest_VF loop isn't entered
(unless there already was a prologue loop).

> I am not yet sure whether to try the next VF after an alias check fail or to
> just fall back to scalar instead.

If you don't then there's no advantage to doing what I suggested?

Richard.

> 
> Cheers,
> Andre
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Reply via email to