On 19 December 2017 at 00:36, Jeff Law <l...@redhat.com> wrote: > On 12/11/2017 08:44 AM, James Greenhalgh wrote: >> Hi, >> >> In the testcase in this patch we create an SLP vector with only two >> elements. Our current vector initialisation code will first duplicate >> the first element to both lanes, then overwrite the top lane with a new >> value. >> >> This duplication can be clunky and wasteful. >> >> Better would be to simply use the fact that we will always be overwriting >> the remaining bits, and simply move the first element to the corrcet place >> (implicitly zeroing all other bits). >> >> This reduces the code generation for this case, and can allow more >> efficient addressing modes, and other second order benefits for AArch64 >> code which has been vectorized to V2DI mode. >> >> Note that the change is generic enough to catch the case for any vector >> mode, but is expected to be most useful for 2x64-bit vectorization. >> >> Unfortunately, on its own, this would cause failures in >> gcc.target/aarch64/load_v2vec_lanes_1.c and >> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more >> vec_merge and vec_duplicate for their simplifications to apply. To fix this, >> add a special case to the AArch64 code if we are loading from two memory >> addresses, and use the load_pair_lanes patterns directly. >> >> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to >> catch: >> >> (vec_merge:OUTER >> (vec_duplicate:OUTER x:INNER) >> (subreg:OUTER y:INNER 0) >> (const_int N)) >> >> And simplify it to: >> >> (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) >> >> This is similar to the existing patterns which are tested in this function, >> without requiring the second operand to also be a vec_duplicate. >> >> Bootstrapped and tested on aarch64-none-linux-gnu and tested on >> aarch64-none-elf. >> >> Note that this requires >> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html >> if we don't want to ICE creating broken vector zero extends. >> >> Are the non-AArch64 parts OK? >> >> Thanks, >> James >> >> --- >> 2017-12-11 James Greenhalgh <james.greenha...@arm.com> >> >> * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code >> generation for cases where splatting a value is not useful. >> * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge >> across a vec_duplicate and a paradoxical subreg forming a vector >> mode to a vec_concat. >> >> 2017-12-11 James Greenhalgh <james.greenha...@arm.com> >> >> * gcc.target/aarch64/vect-slp-dup.c: New. >> >> >> 0001-patch-AArch64-Do-not-perform-a-vector-splat-for-vect.patch >> >> > >> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c >> index 806c309..ed16f70 100644 >> --- a/gcc/simplify-rtx.c >> +++ b/gcc/simplify-rtx.c >> @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, >> machine_mode mode, >> return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); >> } >> >> + /* Replace: >> + >> + (vec_merge:outer (vec_duplicate:outer x:inner) >> + (subreg:outer y:inner 0) >> + (const_int N)) >> + >> + with (vec_concat:outer x:inner y:inner) if N == 1, >> + or (vec_concat:outer y:inner x:inner) if N == 2. >> + >> + Implicitly, this means we have a paradoxical subreg, but such >> + a check is cheap, so make it anyway. > I'm going to assume that N == 1 and N == 3 are handled elsewhere and do > not show up here in practice. > > > So is it advisable to handle the case where the VEC_DUPLICATE and SUBREG > show up in the opposite order? Or is there some canonicalization that > prevents that? > > simplify-rtx bits are OK as-is if we're certain we're not going to get > the alternate ordering of the VEC_MERGE operands. ALso OK if you either > generalize this chunk of code or duplicate & twiddle it to handle the > alternate order. > > I didn't look at the aarch64 specific bits. >
Hi James, Your patch (r255946) introduced regressions on aarch64_be: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255946/report-build-info.html I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83663 to track this. Thanks, Christophe > jeff >