Hi Richard, > > Ok for trunk? > > +;; The complex mla operations always need to expand to two instructions. > +;; The first operation does half the computation and the second does > +the ;; remainder. Because of this, expand early. > +(define_expand "fcmla<rot><mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (plus:VF (match_operand:VF 1 "register_operand") > + (unspec:VF [(match_operand:VF 2 "register_operand") > + (match_operand:VF 3 "register_operand")] > + VCMLA)))] > + "TARGET_COMPLEX" > +{ > + emit_insn (gen_neon_vcmla<rotsplit1><mode> (operands[0], > operands[1], > + operands[2], > +operands[3])); > + emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0], > operands[0], > + operands[2], > +operands[3])); > + DONE; > +}) > > What's the two halves? Why hide this from the vectorizer if you go down all > to the detail and expose the rotation to it? >
The two halves are an implementation detail of the instruction in Armv8.3-a. As far as the Vectorizer is concerned all you want to do, is an FMA rotating one of the operands by 0 or 180 degrees. Also note that the "rotations" in these instructions aren't exactly the same as what would fall under rotation of a complex number, as each instruction can only do half of the final computation you want. In the ISA these instructions have to be used in a pair, where rotations determine the operation you want to perform. E.g. a rotation of #0 followed by #90 makes it a multiply and accumulate. A rotation of #180 followed by #90 makes this a vector complex subtract, which is intended to be used by the first call using a register cleared with 0 (It becomes an "FMS" essentially if you don't clear the register). Each "rotation" determine what operation is done and using which parts of the complex number. You change the "rotations" and the grouping of the instructions to get different operations. I did not expose this to the vectorizer as It seems very ISA specific. > +;; The vcadd and vcmla patterns are made UNSPEC for the explicitly due > +to the ;; fact that their usage need to guarantee that the source > +vectors are ;; contiguous. It would be wrong to describe the operation > +without being able ;; to describe the permute that is also required, > +but even if that is done ;; the permute would have been created as a > +LOAD_LANES which means the values ;; in the registers are in the wrong > order. > > Hmm, it's totally non-obvious to me how this relates to loads or what a "non- > contiguous" > register would be? That is, once you make this an unspec combine will never > be able to synthesize this from intrinsics code that doesn't use this form. > > +(define_insn "neon_vcadd<rot><mode>" > + [(set (match_operand:VF 0 "register_operand" "=w") > + (unspec:VF [(match_operand:VF 1 "register_operand" "w") > + (match_operand:VF 2 "register_operand" "w")] > + VCADD))] > Yes that's my goal, as if operand1 and operand2 are loaded by instructions that would have permuted the values in the register then the instruction doesn't work. The instruction does the permute itself, so it expects the values to have been loaded using a simple load and not a LOAD_LANES. So I am intended to prevent combine from recognizing the operation for that reason. For the ADD combine can be used but then you'd have to match the load and store since you have to change these, for the rest you'll run far afoul of combine's 5 instruction limit. Kind Regards, Tamar > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > 2018-11-11 Tamar Christina <tamar.christ...@arm.com> > > > > * config/arm/arm.c (arm_arch8_3, arm_arch8_4): New. > > * config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): > New. > > (arm_option_reconfigure_globals): Use them. > > * config/arm/iterators.md (VDF, VQ_HSF): New. > > (VCADD, VCMLA): New. > > (VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF. > > * config/arm/neon.md (neon_vcadd<rot><mode>, > fcadd<rot><mode>3, > > neon_vcmla<rot><mode>, fcmla<rot><mode>4): New. > > * config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270, > > UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, > UNSPEC_VCMLA270): New. > > > > gcc/testsuite/ChangeLog: > > > > 2018-11-11 Tamar Christina <tamar.christ...@arm.com> > > > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: Add Arm > support. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: > Likewise. > > * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c: > Likewise. > > > > --