RE: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA.

Kyrylo Tkachov via Gcc-patches Mon, 14 Dec 2020 03:02:09 -0800


> -----Original Message-----
> From: Tamar Christina <tamar.christ...@arm.com>
> Sent: 10 December 2020 17:00
> To: gcc-patches@gcc.gnu.org
> Cc: nd <n...@arm.com>; Richard Earnshaw <richard.earns...@arm.com>;
> Marcus Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov
> <kyrylo.tkac...@arm.com>; Richard Sandiford
> <richard.sandif...@arm.com>
> Subject: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for
> Complex Addition, Multiply and FMA.
> 
> Hi All,
> 
> This adds implementation for the optabs for complex operations.  With this
> the
> following C code:
> 
>   void f90 (float complex a[restrict N], float complex b[restrict N],
>           float complex c[restrict N])
>   {
>     for (int i=0; i < N; i++)
>       c[i] = a[i] + (b[i] * I);
>   }
> 
> generates
> 
>   f90:
>         mov     x3, 0
>         .p2align 3,,7
>   .L2:
>         ldr     q0, [x0, x3]
>         ldr     q1, [x1, x3]
>         fcadd   v0.4s, v0.4s, v1.4s, #90
>         str     q0, [x2, x3]
>         add     x3, x3, 16
>         cmp     x3, 1600
>         bne     .L2
>         ret
> 
> instead of
> 
>   f90:
>         add     x3, x1, 1600
>         .p2align 3,,7
>   .L2:
>         ld2     {v4.4s - v5.4s}, [x0], 32
>         ld2     {v2.4s - v3.4s}, [x1], 32
>         fsub    v0.4s, v4.4s, v3.4s
>         fadd    v1.4s, v5.4s, v2.4s
>         st2     {v0.4s - v1.4s}, [x2], 32
>         cmp     x3, x1
>         bne     .L2
>         ret
> 
> It defined a new iterator VALL_ARITH which contains types for which we can
> do
> general arithmetic (excludes bfloat16).
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Checked with armv8-a+sve2+fp16 and no issues.  Note that sue to a mid-end
> limitation SLP for SVE currently fails for some permutes.  The tests have
> these
> marked as XFAIL.  I do intend to fix this soon.
> 
> Matching tests for these are in the mid-end patches.
> Note that The mid-end patches are still being respun and I may need to
> change the order of some parameters but no other change is expected and
> would like to decrease the size of future patches.  As such..
> 
> Ok for master?


Ok. The patterns look clean to me from a back-end perspective.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-simd.md (cadd<rot><mode>3,
>       cml<fcmac1><rot_op><mode>4, cmul<rot_op><mode>3): New.
>       * config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL,
>       UNSPEC_FCMUL180, UNSPEC_FCMLS, UNSPEC_FCMLS180,
> UNSPEC_CMLS,
>       UNSPEC_CMLS180, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP,
> FCMUL_OP, rot_op,
>       rotsplit1, rotsplit2, fcmac1, sve_rot1, sve_rot2, SVE2_INT_CMLA_OP,
>       SVE2_INT_CMUL_OP, SVE2_INT_CADD_OP): New.): New.): New.
>       (rot): Add UNSPEC_FCMLS, UNSPEC_FCMUL, UNSPEC_FCMUL180.
>       * config/aarch64/aarch64-sve.md (cadd<rot><mode>3,
>       cml<fcmac1><rot_op><mode>4, cmul<rot_op><mode>3): New.
>       * config/aarch64/aarch64-sve2.md (cadd<rot><mode>3,
>       cml<fcmac1><rot_op><mode>4, cmul<rot_op><mode>3): New.
> 
> --

RE: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA.

Reply via email to