On Thu, Oct 24, 2019 at 12:56 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> "H.J. Lu" <hjl.to...@gmail.com> writes:
> > On Wed, Oct 23, 2019 at 4:51 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Richard Biener <richard.guent...@gmail.com> writes:
> >> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
> >> > <richard.sandif...@arm.com> wrote:
> >> >>
> >> >> This patch is the first of a series that tries to remove two
> >> >> assumptions:
> >> >>
> >> >> (1) that all vectors involved in vectorisation must be the same size
> >> >>
> >> >> (2) that there is only one vector mode for a given element mode and
> >> >>     number of elements
> >> >>
> >> >> Relaxing (1) helps with targets that support multiple vector sizes or
> >> >> that require the number of elements to stay the same.  E.g. if we're
> >> >> vectorising code that operates on narrow and wide elements, and the
> >> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
> >> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> >> >> for the wide elements.
> >> >>
> >> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> >> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> >> >> vectors to work with -msve-vector-bits=256.
> >> >>
> >> >> The patch adds a new hook that targets can use to control how we
> >> >> move from one vector mode to another.  The hook takes a starting vector
> >> >> mode, a new element mode, and (optionally) a new number of elements.
> >> >> The flexibility needed for (1) comes in when the number of elements
> >> >> isn't specified.
> >> >>
> >> >> All callers in this patch specify the number of elements, but a later
> >> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> >> >> for a few days, hence the RFC/A tag.
> >> >>
> >> >> Tested individually on aarch64-linux-gnu and as a series on
> >> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> >> >> look OK?
> >> >
> >> > In isolation the idea looks good but maybe a bit limited?  I see
> >> > how it works for the same-size case but if you consider x86
> >> > where we have SSE, AVX256 and AVX512 what would it return
> >> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> >> > kind of query not intended (where the component modes match
> >> > but nunits is zero)?
> >>
> >> In that case we'd normally get V4SImode back.  It's an allowed
> >> combination, but not very useful.
> >>
> >> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> >> > it just used to stay in the same register set for different component
> >> > modes?
> >>
> >> Yeah, the idea is to use the original vector mode as essentially
> >> a base architecture.
> >>
> >> The follow-on patches replace vec_info::vector_size with
> >> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
> >> with targetm.vectorize.autovectorize_vector_modes.  These are the
> >> starting modes that would be passed to the hook in the nunits==0 case.
> >>
> >
> > For a target with different vector sizes,
> > targetm.vectorize.autovectorize_vector_sizes
> > doesn't return the optimal vector sizes for known trip count and
> > unknown trip count.
> > For a target with 128-bit and 256-bit vectors, 256-bit followed by
> > 128-bit works well for
> > known trip count since vectorizer knows the maximum usable vector size.  
> > But for
> > unknown trip count, we may want to use 128-bit vector when 256-bit
> > code path won't
> > be used at run-time, but 128-bit vector will.  At the moment, we can
> > only use one
> > set of vector sizes for both known trip count and unknown trip count.
>
> Yeah, we're hit by this for AArch64 too.  Andre's recent patches:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01564.html
> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00205.html
>
> should help.
>
> >   Can vectorizer
> > support 2 sets of vector sizes, one for known trip count and the other
> > for unknown
> > trip count?
>
> The approach Andre's taking is to continue to use the wider vector size
> for unknown trip counts, and instead ensure that the epilogue loop is
> vectorised at the narrower vector size if possible.  The patches then
> use this vectorised epilogue as a fallback "main" loop if the runtime
> trip count is too low for the wide vectors.

I tried it on 548.exchange2_r in SPEC CPU 2017.  There is short cut
to vectorized epilogue for low trip count.

-- 
H.J.

Reply via email to