Re: [PATCH] aarch64: Use SVE SUBR instruction with Neon modes

Andrew Pinski Fri, 15 Nov 2024 01:20:44 -0800

On Thu, Nov 14, 2024 at 7:50 PM Soumya AR <soum...@nvidia.com> wrote:
>
> The SVE SUBR instruction performs a reversed subtract from an immediate.
>
> This patches enables the emission of SUBR for Neon modes and avoids the need 
> to
> materialise an explicit constant.
>
> For example, the below test case:
>
> typedef long long __attribute__ ((vector_size (16))) v2di;
>
> v2di subr_v2di (v2di x)
> {
>         return 15 - x;
> }
>
> compiles to:
>
> subr_v2di:
>         mov     z31.d, #15
>         sub     v0.2d, v31.2d, v0.2d
>         ret
>
> but can just be:
>
> subr_v2di:
>         subr    z0.d, z0.d, #15
>         ret
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Soumya AR <soum...@nvidia.com>
>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64-simd.md:
>         (sub<mode>3<vczle><vczbe>): Extended the pattern to emit SUBR for SVE
>         targets if operand 1 is an immediate.
>         * config/aarch64/predicates.md (aarch64_sve_arith_imm_or_reg_operand):
>         New predicate that accepts aarch64_sve_arith_immediate in operand 1 
> but
>         only for TARGET_SVE.


I think this might cause wrong code with:
```
#include <arm_neon.h>
uint32x4_t foo_sub_u32 (uint32x2_t a, uint32x2_t b)
{
  uint32x2_t zeros = vcreate_u32 (0);
  b = vdup_n_u32 (15);
  return vcombine_u32 (vsub_u32 (b, a), zeros);
}
```
As now the elements that are supposed to be zero are now `15-x`.
This is due to the `<vczle><vczbe>` part of the pattern name.

Thanks,
Andrew


>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/sve/subr-sve.c: New test.
>

Re: [PATCH] aarch64: Use SVE SUBR instruction with Neon modes

Reply via email to