On Mon, Nov 20, 2017 at 1:35 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law <l...@redhat.com> wrote:
>>> On 11/09/2017 06:24 AM, Richard Sandiford wrote:
>>>> ...so that we can use them for variable-length vectors.  For now
>>>> constant-length vectors continue to use VEC_PERM_EXPR and the
>>>> vec_perm_const optab even for cases that the new optabs could
>>>> handle.
>>>>
>>>> The vector optabs are inconsistent about whether there should be
>>>> an underscore before the mode part of the name, but the other lo/hi
>>>> optabs have one.
>>>>
>>>> Doing this means that we're able to optimise some SLP tests using
>>>> non-SLP (for now) on targets with variable-length vectors, so the
>>>> patch needs to add a few XFAILs.  Most of these go away with later
>>>> patches.
>>>>
>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>>> and powerpc64le-linus-gnu.  OK to install?
>>>>
>>>> Richard
>>>>
>>>>
>>>> 2017-11-09  Richard Sandiford  <richard.sandif...@linaro.org>
>>>>           Alan Hayward  <alan.hayw...@arm.com>
>>>>           David Sherwood  <david.sherw...@arm.com>
>>>>
>>>> gcc/
>>>>       * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
>>>>       (vec_extract_even, vec_extract_odd): Document new optabs.
>>>>       * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
>>>>       (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
>>>>       functions.
>>>>       * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
>>>>       (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
>>>>       New optabs.
>>>>       * tree-vect-data-refs.c: Include internal-fn.h.
>>>>       (vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}.
>>>>       (vect_permute_store_chain): Use them here too.
>>>>       (vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
>>>>       (vect_permute_load_chain): Use them here too.
>>>>       * tree-vect-stmts.c (can_reverse_vector_p): New function.
>>>>       (get_negative_load_store_type): Use it.
>>>>       (reverse_vector): New function.
>>>>       (vectorizable_store, vectorizable_load): Use it.
>>>>       * config/aarch64/iterators.md (perm_optab): New iterator.
>>>>       * config/aarch64/aarch64-sve.md (<perm_optab>_<mode>): New expander.
>>>>       (vec_reverse_<mode>): Likewise.
>>>>
>>>> gcc/testsuite/
>>>>       * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
>>>>       * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
>>>>       * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
>>>>       * gcc.dg/vect/pr68445.c: Likewise.
>>>>       * gcc.dg/vect/slp-12a.c: Likewise.
>>>>       * gcc.dg/vect/slp-13-big-array.c: Likewise.
>>>>       * gcc.dg/vect/slp-13.c: Likewise.
>>>>       * gcc.dg/vect/slp-14.c: Likewise.
>>>>       * gcc.dg/vect/slp-15.c: Likewise.
>>>>       * gcc.dg/vect/slp-42.c: Likewise.
>>>>       * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>>>>       * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>>>>       * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>>>>       * gcc.dg/vect/slp-reduc-4.c: Likewise.
>>>>       * gcc.dg/vect/slp-reduc-7.c: Likewise.
>>>>       * gcc.target/aarch64/sve_vec_perm_2.c: New test.
>>>>       * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
>>>>       * gcc.target/aarch64/sve_vec_perm_3.c: New test.
>>>>       * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
>>>>       * gcc.target/aarch64/sve_vec_perm_4.c: New test.
>>>>       * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
>>> OK.
>>
>> It's really a step backwards - we had those optabs and a tree code in
>> the past and
>> canonicalizing things to VEC_PERM_EXPR made things simpler.
>>
>> Why doesn't VEC_PERM <v1, v2, that-constant-series-expr-thing> not work?
>
> The problems with that are:
>
> - It doesn't work for vectors with 256-bit elements because the indices
>   wrap round.

That's a general issue that would need to be addressed for larger
vectors (GCN?).
I presume the requirement that the permutation vector have the same size
needs to be relaxed.

> - Supporting a fake VEC_PERM_EXPR <v256qi, v256qi, v256hi> for a few
>   special cases would be hard, especially since v256hi isn't a normal
>   vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
>   then have to worry about that special case.

I think it's not really a special case - any code here should just
expect the same
number of vector elements and not a particular size.  You already dealt with
using a char[] vector for permutations I think.

> - VEC_SERIES_CST only copes naturally with EXTRACT_EVEN, EXTRACT_ODD
>   and REVERSE.  INTERLEAVE_LO is { 0, N/2, 1, N/2+1, ... }.
>   I guess it's possible to represent that using a combination of
>   shifts, masks, and additions, but then:
>
>   1) when generating them, we'd need to make sure that we cost the
>      operation as a single permute, rather than costing all the shifts,
>      masks and additions
>
>   2) we'd need to make sure that all gimple optimisations that run
>      afterwards don't perturb the sequence, otherwise we'll end up
>      with something that's very expensive.
>
>   3) that sequence wouldn't be handled by existing VEC_PERM_EXPR
>      optimisations, and it wouldn't be trivial to add it, since we'd
>      need to re-recognise the sequence first.
>
>   4) expand would need to re-recognise the sequence and use the
>      optab anyway.

Well, the answer is of course that you just need a more powerful VEC_SERIES_CST
that can handle INTERLEAVE_HI/LO.  It seems to me SVE can generate
such masks relatively cheaply -- do a 0, 1, 2, 3... sequence and then do
a INTERLEAVE_HI/LO on it.  So it makes sense that we can directly specify it.
Suggested fix: add a interleaved bit to VEC_SERIES_CST.

At least I'd like to see it used for the cases it can already handle.

VEC_PERM_EXPR is supposed to be the only permutation operation, if it cannot
handle some cases it needs to be fixed / its constraints relaxed (like
the v256qi case).

>   Using an internal function seems much simpler :-)
>
> I think VEC_PERM_EXPR is useful because it represents the same
> operation as __builtin_shuffle, and we want to optimise that as best
> we can.  But these internal functions are only used by the vectoriser,
> which should always see what the final form of the permute should be.

You hope so.  We have several cases where later unrolling and CSE/forwprop
optimize permutations away.

Richard.

> Thanks,
> Richard

Reply via email to