Victor Do Nascimento via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > The backend pattern for storing a pair of identical values in 32 and 64-bit > modes with the machine instruction STP was missing, and multiple instructions > were needed to reproduce this behavior as a result of failed RTL pattern > match in combine pass. > > For the test case : > > typedef long long v2di __attribute__((vector_size (16))); > typedef int v2si __attribute__((vector_size (8))); > > void > foo (v2di *x, long long a) > { > v2di tmp = {a, a}; > *x = tmp; > } > > void > foo2 (v2si *x, int a) > { > v2si tmp = {a, a}; > *x = tmp; > } > > at -O2 on aarch64 gives: > > foo: > stp x1, x1, [x0] > ret > foo2: > stp w1, w1, [x0] > ret > > instead of: > > foo: > dup v0.2d, x1 > str q0, [x0] > ret > foo2: > dup v0.2s, w1 > str d0, [x0] > ret > > In preparation for the next stage 1 phase of development, added new RTL > template, unittest and checked for regressions on bootstrapped > aarch64-none-linux-gnu. > > gcc/ChangeLog > > 2021-02-04 victor Do Nascimento <victor.donascime...@arm.com> > > * config/aarch64/aarch64-simd.md: Implement RTX pattern for > mapping 'vec_duplicate' RTX onto 'STP' ASM insn. > * config/aarch64/iterators.md: Implement ldpstp_vel_sz iterator > to map STP/LDP vector element mode to correct suffix in > attribute type definition of aarch64_simd_stp<mode> pattern.
A more typical changelog entry would be: * config/aarch64/iterators.md (ldpstp_vel_sz): New mode attribute. * config/aarch64/aarch64-simd.md (aarch64_simd_stp<mode>): New pattern. > gcc/testsuite/ChangeLog > > 2021-02-04 Victor Do Nascimento <victor.donascime...@arm.com> > > * gcc.target/stp_vec-dup_32_64-1.c: Added test. > > Regards, > Victor > > --- > gcc/config/aarch64/aarch64-simd.md | 10 +++++++++ > gcc/config/aarch64/iterators.md | 3 +++ > .../gcc.target/aarch64/stp_vec_dup_32_64-1.c | 22 +++++++++++++++++++ > 3 files changed, 35 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 71aa77dd010..3d53bab0018 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -205,6 +205,16 @@ > [(set_attr "type" "neon_stp")] > ) > > +(define_insn "aarch64_simd_stp<mode>" > + [(set (match_operand:VP_2E 0 "aarch64_mem_pair_operand" "=Ump,Ump") > + (vec_duplicate:VP_2E (match_operand:<VEL> 1 "register_operand" > "w,r")))] Formatting nit: should just be one tab here. I would have just made that change locally and committed, but I think there's a problem: aarch64_mem_pair_operand and Ump are geared for pairs of full-vector stores, rather than for pairs of elements. This means that (for example) the V2SI range will be [-256,255] * 8 rather than the expected [-256,255] * 4. I think we need to use aarch64_mem_pair_lanes_operand and Umn instead, as for store_pair_lanes<mode>. In addition: /* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode corresponds to the actual size of the memory being loaded/stored and the mode of the corresponding addressing mode is half of that. */ if (type == ADDR_QUERY_LDP_STP_N && known_eq (GET_MODE_SIZE (mode), 16)) mode = DFmode; only handles 128-bit vectors, whereas here we need it to handle 64-bit vectors too. It would be good to test the limits, e.g.: > diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > new file mode 100644 > index 00000000000..a37c903dfd4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c > @@ -0,1 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +typedef long long v2di __attribute__((vector_size (16))); > +typedef int v2si __attribute__((vector_size (8))); > + > +void > +foo (v2di *x, long long a) > +{ > + v2di tmp = {a, a}; > + *x = tmp; > +} > + > +void > +foo2 (v2si *x, int a) > +{ > + v2si tmp = {a, a}; > + *x = tmp; > +} We could have additional tests for: x[-129] = tmp; // out of range x[-128] = tmp; // in range x[127] = tmp; // in range x[128] = tmp; // out of range Thanks, Richard > + > +/* { dg-final { scan-assembler-times "stp\t" 2 } } */ > +/* { dg-final { scan-assembler-not "dup\t" } } */