On 11/22/2017 11:10 AM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandif...@linaro.org> writes:
>> Two things stopped us using SLP reductions with variable-length vectors:
>>
>> (1) We didn't have a way of constructing the initial vector.
>> This patch does it by creating a vector full of the neutral
>> identity value and then using a shift-and-insert function
>> to insert any non-identity inputs into the low-numbered elements.
>> (The non-identity values are needed for double reductions.)
>> Alternatively, for unchained MIN/MAX reductions that have no neutral
>> value, we instead use the same duplicate-and-interleave approach as
>> for SLP constant and external definitions (added by a previous
>> patch).
>>
>> (2) The epilogue for constant-length vectors would extract the vector
>> elements associated with each SLP statement and do scalar arithmetic
>> on these individual elements. For variable-length vectors, the patch
>> instead creates a reduction vector for each SLP statement, replacing
>> the elements for other SLP statements with the identity value.
>> It then uses a hardware reduction instruction on each vector.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.
>
> Here's an updated version that applies on top of the recent
> removal of REDUC_*_EXPR. Tested as before.
>
> Thanks,
> Richard
>
>
> 2017-11-22 Richard Sandiford <richard.sandif...@linaro.org>
> Alan Hayward <alan.hayw...@arm.com>
> David Sherwood <david.sherw...@arm.com>
>
> gcc/
> * doc/md.texi (vec_shl_insert_@var{m}): New optab.
> * internal-fn.def (VEC_SHL_INSERT): New internal function.
> * optabs.def (vec_shl_insert_optab): New optab.
> * tree-vectorizer.h (can_duplicate_and_interleave_p): Declare.
> (duplicate_and_interleave): Likewise.
> * tree-vect-loop.c: Include internal-fn.h.
> (neutral_op_for_slp_reduction): New function, split out from
> get_initial_defs_for_reduction.
> (get_initial_def_for_reduction): Handle option 2 for variable-length
> vectors by loading the neutral value into a vector and then shifting
> the initial value into element 0.
> (get_initial_defs_for_reduction): Replace the code argument with
> the neutral value calculated by neutral_op_for_slp_reduction.
> Use gimple_build_vector for constant-length vectors.
> Use IFN_VEC_SHL_INSERT for variable-length vectors if all
> but the first group_size elements have a neutral value.
> Use duplicate_and_interleave otherwise.
> (vect_create_epilog_for_reduction): Take a neutral_op parameter.
> Update call to get_initial_defs_for_reduction. Handle SLP
> reductions for variable-length vectors by creating one vector
> result for each scalar result, with the elements associated
> with other scalar results stubbed out with the neutral value.
> (vectorizable_reduction): Call neutral_op_for_slp_reduction.
> Require IFN_VEC_SHL_INSERT for double reductions on
> variable-length vectors, or SLP reductions that have
> a neutral value. Require can_duplicate_and_interleave_p
> support for variable-length unchained SLP reductions if there
> is no neutral value, such as for MIN/MAX reductions. Also require
> the number of vector elements to be a multiple of the number of
> SLP statements when doing variable-length unchained SLP reductions.
> Update call to vect_create_epilog_for_reduction.
> * tree-vect-slp.c (can_duplicate_and_interleave_p): Make public
> and remove initial values.
> (duplicate_and_interleave): Use IFN_VEC_SHL_INSERT for
> variable-length vectors if all but the first group_size elements
> have a neutral value.
> * config/aarch64/aarch64.md (UNSPEC_INSR): New unspec.
> * config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn.
>
> gcc/testsuite/
> * gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
> * gcc.dg/vect/pr67790.c: Likewise.
> * gcc.dg/vect/slp-reduc-1.c: Likewise.
> * gcc.dg/vect/slp-reduc-2.c: Likewise.
> * gcc.dg/vect/slp-reduc-3.c: Likewise.
> * gcc.dg/vect/slp-reduc-5.c: Likewise.
> * gcc.target/aarch64/sve_slp_5.c: New test.
> * gcc.target/aarch64/sve_slp_5_run.c: Likewise.
> * gcc.target/aarch64/sve_slp_6.c: Likewise.
> * gcc.target/aarch64/sve_slp_6_run.c: Likewise.
> * gcc.target/aarch64/sve_slp_7.c: Likewise.
> * gcc.target/aarch64/sve_slp_7_run.c: Likewise.
OK
jeff