On Mon, Nov 12, 2018 at 06:31:45AM -0600, Tamar Christina wrote: > Hi Kyrill, > > > Hi Tamar, > > > > On 11/11/18 10:26, Tamar Christina wrote: > > > Hi All, > > > > > > This patch adds the expander support for supporting autovectorization of > > > complex number operations > > > such as Complex addition with a rotation along the Argand plane. This > > > also adds support for complex > > > FMA. > > > > > > The instructions are described in the ArmARM [1] and are available from > > > Armv8.3-a onwards. > > > > > > Concretely, this generates > > > > > > f90: > > > mov x3, 0 > > > .p2align 3,,7 > > > .L2: > > > ldr q0, [x0, x3] > > > ldr q1, [x1, x3] > > > fcadd v0.2d, v0.2d, v1.2d, #90 > > > str q0, [x2, x3] > > > add x3, x3, 16 > > > cmp x3, 3200 > > > bne .L2 > > > ret > > > > > > now instead of > > > > > > f90: > > > mov x4, x1 > > > mov x1, x2 > > > add x3, x4, 31 > > > add x2, x0, 31 > > > sub x3, x3, x1 > > > sub x2, x2, x1 > > > cmp x3, 62 > > > mov x3, 62 > > > ccmp x2, x3, 0, hi > > > bls .L5 > > > mov x2, x4 > > > add x3, x0, 3200 > > > .p2align 3,,7 > > > .L3: > > > ld2 {v2.2d - v3.2d}, [x0], 32 > > > ld2 {v4.2d - v5.2d}, [x2], 32 > > > cmp x0, x3 > > > fsub v0.2d, v2.2d, v5.2d > > > fadd v1.2d, v4.2d, v3.2d > > > st2 {v0.2d - v1.2d}, [x1], 32 > > > bne .L3 > > > ret > > > .L5: > > > add x6, x0, 8 > > > add x5, x4, 8 > > > add x2, x1, 8 > > > mov x3, 0 > > > .p2align 3,,7 > > > .L2: > > > ldr d1, [x0, x3] > > > ldr d3, [x5, x3] > > > ldr d0, [x6, x3] > > > ldr d2, [x4, x3] > > > fsub d1, d1, d3 > > > fadd d0, d0, d2 > > > str d1, [x1, x3] > > > str d0, [x2, x3] > > > add x3, x3, 16 > > > cmp x3, 3200 > > > bne .L2 > > > ret > > > > > > For complex additions with a 90* rotation along the Argand plane. > > > > > > [1] > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile > > > > > > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and > > > x86_64-pc-linux-gnu > > > are still on going but previous patch showed no regressions. > > > > > > The instructions have also been tested on aarch64-none-elf and > > > arm-none-eabi on a Armv8.3-a model > > > and -march=Armv8.3-a+fp16 and all tests pass. > > > > > > Ok for trunk?
OK with the comment typos fixed. > > > gcc/ChangeLog: > > > > > > 2018-11-11 Tamar Christina <tamar.christ...@arm.com> > > > > > > * config/aarch64/aarch64-simd.md (aarch64_fcadd<rot><mode>, > > > fcadd<rot><mode>3, aarch64_fcmla<rot><mode>, > > > fcmla<rot><mode>4): New. > > > * config/aarch64/aarch64.h (TARGET_COMPLEX): New. > > > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270, > > > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): > > > New. > > > (FCADD, FCMLA): New. > > > (rot, rotsplit1, rotsplit2): New. > > > * config/arm/types.md (neon_fcadd, neon_fcmla): New. Should we push these to an existing class for now, and split them later when someone provides a scheduling model which makes use of them? > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > index > > c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb > > 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -419,6 +419,63 @@ > > } > > ) > > > > +;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to > > the s/for the explicitly/explicitly > > +;; fact that their usage need to guarantee that the source vectors are s/need/needs > > +;; contiguous. It would be wrong to describe the operation without being > > able > > +;; to describe the permute that is also required, but even if that is done > > +;; the permute would have been created as a LOAD_LANES which means the > > values > > +;; in the registers are in the wrong order. > > +(define_insn "aarch64_fcadd<rot><mode>" > > + [(set (match_operand:VHSDF 0 "register_operand" "=w") > > + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") > > + (match_operand:VHSDF 2 "register_operand" "w")] > > + FCADD))] > > + "TARGET_COMPLEX" > > + "fcadd\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>, #<rot>" > > + [(set_attr "type" "neon_fcadd")] > > +)