Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

Richard Earnshaw (lists) Mon, 01 Aug 2016 03:11:01 -0700

On 13/07/16 17:14, Kyrill Tkachov wrote:
> Hi all,
> 
> The most common way to load and store TImode value in aarch64 is to
> perform an LDP/STP of two X-registers.
> This is the *movti_aarch64 pattern in aarch64.md.
> There is a bug in the logic in aarch64_classify_address where it
> validates the offset in the address used
> to load a TImode value. It passes down TImode to the
> aarch64_offset_7bit_signed_scaled_p check which rejects
> offsets that are not a multiple of the mode size of TImode (16).
> However, this is too conservative as X-reg LDP/STP
> instructions accept immediate offsets that are a multiple of 8.
> 
> Also, considering that the definition of
> aarch64_offset_7bit_signed_scaled_p is:
>   return (offset >= -64 * GET_MODE_SIZE (mode)
>       && offset < 64 * GET_MODE_SIZE (mode)
>       && offset % GET_MODE_SIZE (mode) == 0);
> 
> I think the range check may even be wrong for TImode as this will accept
> offsets in the range [-1024, 1024)
> (as long as they are a multiple of 16)
> whereas X-reg LDP/STP instructions only accept offsets in the range
> [-512, 512).
> So since the check is for an X-reg LDP/STP address we should be passing
> down DImode.
> 
> This patch does that and enables more aggressive generation of REG+IMM
> addressing modes for 64-bit aligned
> TImode values, eliminating many address calculation instructions.
> For the testcase in the patch we currently generate:
> bar:
>         add     x1, x1, 8
>         add     x0, x0, 8
>         ldp     x2, x3, [x1]
>         stp     x2, x3, [x0]
>         ret
> 
> whereas with this patch we generate:
> bar:
>         ldp     x2, x3, [x1, 8]
>         stp     x2, x3, [x0, 8]
>         ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?


OK.

R.

> 
> Thanks,
> Kyrill
> 
> 2016-07-13  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
>     performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
>     addresses.
> 
> 2016-07-13  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.
> 
> aarch64-timode-addr.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info 
> *info,
>            X,X: 7-bit signed scaled offset
>            Q:   9-bit signed offset
>            We conservatively require an offset representable in either mode.
> -        */
> +          When performing the check for pairs of X registers i.e.  LDP/STP
> +          pass down DImode since that is the natural size of the LDP/STP
> +          instruction memory accesses.  */
>         if (mode == TImode || mode == TFmode)
> -         return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
> +         return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
>                   && offset_9bit_signed_unscaled_p (mode, offset));
>  
>         /* A 7bit offset check because OImode will emit a ldp/stp
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c 
> b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
> @@ -0,0 +1,20 @@
> +/* { dg-options "-O2" } */
> +
> +/* Check that we can use a REG + IMM addressing mode when moving an unaligned
> +   TImode value to and from memory.  */
> +
> +struct foo
> +{
> +  long long b;
> +  __int128 a;
> +} __attribute__ ((packed));
> +
> +void
> +bar (struct foo *p, struct foo *q)
> +{
> +  p->a = q->a;
> +}
> +
> +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
> +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
> +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
>

Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

Reply via email to