On 13/07/16 17:14, Kyrill Tkachov wrote: > Hi all, > > The most common way to load and store TImode value in aarch64 is to > perform an LDP/STP of two X-registers. > This is the *movti_aarch64 pattern in aarch64.md. > There is a bug in the logic in aarch64_classify_address where it > validates the offset in the address used > to load a TImode value. It passes down TImode to the > aarch64_offset_7bit_signed_scaled_p check which rejects > offsets that are not a multiple of the mode size of TImode (16). > However, this is too conservative as X-reg LDP/STP > instructions accept immediate offsets that are a multiple of 8. > > Also, considering that the definition of > aarch64_offset_7bit_signed_scaled_p is: > return (offset >= -64 * GET_MODE_SIZE (mode) > && offset < 64 * GET_MODE_SIZE (mode) > && offset % GET_MODE_SIZE (mode) == 0); > > I think the range check may even be wrong for TImode as this will accept > offsets in the range [-1024, 1024) > (as long as they are a multiple of 16) > whereas X-reg LDP/STP instructions only accept offsets in the range > [-512, 512). > So since the check is for an X-reg LDP/STP address we should be passing > down DImode. > > This patch does that and enables more aggressive generation of REG+IMM > addressing modes for 64-bit aligned > TImode values, eliminating many address calculation instructions. > For the testcase in the patch we currently generate: > bar: > add x1, x1, 8 > add x0, x0, 8 > ldp x2, x3, [x1] > stp x2, x3, [x0] > ret > > whereas with this patch we generate: > bar: > ldp x2, x3, [x1, 8] > stp x2, x3, [x0, 8] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk?
OK. R. > > Thanks, > Kyrill > > 2016-07-13 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when > performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP > addresses. > > 2016-07-13 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test. > > aarch64-timode-addr.patch > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info > *info, > X,X: 7-bit signed scaled offset > Q: 9-bit signed offset > We conservatively require an offset representable in either mode. > - */ > + When performing the check for pairs of X registers i.e. LDP/STP > + pass down DImode since that is the natural size of the LDP/STP > + instruction memory accesses. */ > if (mode == TImode || mode == TFmode) > - return (aarch64_offset_7bit_signed_scaled_p (mode, offset) > + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) > && offset_9bit_signed_unscaled_p (mode, offset)); > > /* A 7bit offset check because OImode will emit a ldp/stp > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c > b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c > @@ -0,0 +1,20 @@ > +/* { dg-options "-O2" } */ > + > +/* Check that we can use a REG + IMM addressing mode when moving an unaligned > + TImode value to and from memory. */ > + > +struct foo > +{ > + long long b; > + __int128 a; > +} __attribute__ ((packed)); > + > +void > +bar (struct foo *p, struct foo *q) > +{ > + p->a = q->a; > +} > + > +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */ > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */ > +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */ >