Hi Tamar,
On 11/12/18 15:46, Tamar Christina wrote:
Hi All,
This patch adds NEON intrinsics and tests for the Armv8.3-a complex
multiplication and add instructions with a rotate along the Argand plane.
The instructions are documented in the ArmARM[1] and the intrinsics
specification
will be published on the Arm website [2].
The Lane versions of these instructions are special in that they always select
a pair.
using index 0 means selecting lane 0 and 1. Because of this the range check
for the
intrinsics require special handling.
On Arm, in order to implement some of the lane intrinsics we're using the
structure of the
register file. The lane variant of these instructions always select a D
register, but the data
itself can be stored in Q registers. This means that for single precision
complex numbers you are
only allowed to select D[0] but using the register file layout you can get the
range 0-1 for lane indices
by selecting between Dn[0] and Dn+1[0].
Same reasoning applies for half float complex numbers, except there your D
register indexes can be 0 or 1, so you have
a total range of 4 elements (for a V8HF).
[1]
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://developer.arm.com/docs/101028/latest
Bootstrapped Regtested on arm-none-gnueabihf and no issues.
Ok for trunk?
Ok.
Thanks,
Kyrill
Thanks,
Tamar
gcc/ChangeLog:
2018-12-11 Tamar Christina <tamar.christ...@arm.com>
* config/arm/arm-builtins.c
(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
(MAC_LANE_PAIR_QUALIFIERS): New.
(arm_expand_builtin_args): Use it.
(arm_expand_builtin_1): Likewise.
* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
* config/arm/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.
(vcmlaq_laneq_f32): New.
(vcmla_rot90_f32): New.
(vcmlaq_rot90_f32): New.
(vcmla_rot90_lane_f32): New.
(vcmla_rot90_laneq_f32): New.
(vcmlaq_rot90_lane_f32): New.
(vcmlaq_rot90_laneq_f32): New.
(vcmla_rot180_f32): New.
(vcmlaq_rot180_f32): New.
(vcmla_rot180_lane_f32): New.
(vcmla_rot180_laneq_f32): New.
(vcmlaq_rot180_lane_f32): New.
(vcmlaq_rot180_laneq_f32): New.
(vcmla_rot270_f32): New.
(vcmlaq_rot270_f32): New.
(vcmla_rot270_lane_f32): New.
(vcmla_rot270_laneq_f32): New.
(vcmlaq_rot270_lane_f32): New.
(vcmlaq_rot270_laneq_f32): New.
* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180,
vcmla_lane270,
vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
gcc/testsuite/ChangeLog:
2018-12-11 Tamar Christina <tamar.christ...@arm.com>
* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32
regexpr.
* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
--