RE: [PATCH 8/9][GCC][Arm] Add autovectorization support for complex multiplication and addition

Tamar Christina Wed, 14 Nov 2018 07:48:02 -0800

Hi Richard,

> > Ok for trunk?
> 
> +;; The complex mla operations always need to expand to two instructions.
> +;; The first operation does half the computation and the second does
> +the ;; remainder.  Because of this, expand early.
> +(define_expand "fcmla<rot><mode>4"
> +  [(set (match_operand:VF 0 "register_operand")
> +       (plus:VF (match_operand:VF 1 "register_operand")
> +                (unspec:VF [(match_operand:VF 2 "register_operand")
> +                            (match_operand:VF 3 "register_operand")]
> +                            VCMLA)))]
> +  "TARGET_COMPLEX"
> +{
> +  emit_insn (gen_neon_vcmla<rotsplit1><mode> (operands[0],
> operands[1],
> +                                             operands[2],
> +operands[3]));
> +  emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0],
> operands[0],
> +                                             operands[2],
> +operands[3]));
> +  DONE;
> +})
> 
> What's the two halves?  Why hide this from the vectorizer if you go down all
> to the detail and expose the rotation to it?
>


The two halves are an implementation detail of the instruction in Armv8.3-a. As 
far as the
Vectorizer is concerned all you want to do, is an FMA rotating one of the 
operands by 0 or 180 degrees.

Also note that the "rotations" in these instructions aren't exactly the same as 
what would fall under rotation of a complex number,
as each instruction can only do half of the final computation you want.

In the ISA these instructions have to be used in a pair, where rotations 
determine
the operation you want to perform. E.g. a rotation of #0 followed by #90 makes 
it a multiply and accumulate.

A rotation of #180 followed by #90 makes this a vector complex subtract, which 
is intended to be used by the first call
using a register cleared with 0 (It becomes an "FMS" essentially if you don't 
clear the register).
Each "rotation" determine what operation is done and using which parts of the 
complex number. You change the
"rotations" and the grouping of the instructions to get different operations.

I did not expose this to the vectorizer as It seems very ISA specific.

> +;; The vcadd and vcmla patterns are made UNSPEC for the explicitly due
> +to the ;; fact that their usage need to guarantee that the source
> +vectors are ;; contiguous.  It would be wrong to describe the operation
> +without being able ;; to describe the permute that is also required,
> +but even if that is done ;; the permute would have been created as a
> +LOAD_LANES which means the values ;; in the registers are in the wrong
> order.
> 
> Hmm, it's totally non-obvious to me how this relates to loads or what a "non-
> contiguous"
> register would be?  That is, once you make this an unspec combine will never
> be able to synthesize this from intrinsics code that doesn't use this form.
> 
> +(define_insn "neon_vcadd<rot><mode>"
> +  [(set (match_operand:VF 0 "register_operand" "=w")
> +       (unspec:VF [(match_operand:VF 1 "register_operand" "w")
> +                   (match_operand:VF 2 "register_operand" "w")]
> +                   VCADD))]
> 

Yes that's my goal, as if operand1 and operand2 are loaded by instructions that
would have permuted the values in the register then the instruction doesn't 
work.

The instruction does the permute itself, so it expects the values to have been 
loaded
using a simple load and not a LOAD_LANES. So I am intended to prevent combine 
from
recognizing the operation for that reason.  For the ADD combine can be used but 
then you'd
have to match the load and store since you have to change these, for the rest 
you'll run far afoul
of combine's 5 instruction limit.

Kind Regards,
Tamar

> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-11-11  Tamar Christina  <tamar.christ...@arm.com>
> >
> >         * config/arm/arm.c (arm_arch8_3, arm_arch8_4): New.
> >         * config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4):
> New.
> >         (arm_option_reconfigure_globals): Use them.
> >         * config/arm/iterators.md (VDF, VQ_HSF): New.
> >         (VCADD, VCMLA): New.
> >         (VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF.
> >         * config/arm/neon.md (neon_vcadd<rot><mode>,
> fcadd<rot><mode>3,
> >         neon_vcmla<rot><mode>, fcmla<rot><mode>4): New.
> >         * config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270,
> >         UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180,
> UNSPEC_VCMLA270): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-11-11  Tamar Christina  <tamar.christ...@arm.com>
> >
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: Add Arm
> support.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c:
> Likewise.
> >         * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c:
> Likewise.
> >
> > --

RE: [PATCH 8/9][GCC][Arm] Add autovectorization support for complex multiplication and addition

Reply via email to