[PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

Matthew Wahab Tue, 17 May 2016 07:43:19 -0700

The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the VFP instruction set. This patch adds support for these
instructions to the ARM backend.


In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.

The instruction patterns can be used by the compiler to optimize
half-precision operations. Since the patterns names are non-standard,
the only way for half-precision operations to be generated is by using
the intrinsics added by this patch series meaning that existing code
will not be affected.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  <matthew.wa...@arm.com>

        * config/arm/iterators.md (Code iterators): Fix some white-space
        in the comments.
        (GLTE): New.
        (ABSNEG): New
        (FCVT): Moved from vfp.md.
        (VCVT_HF_US_N): New.
        (VCVT_SI_US_N): New.
        (VCVT_HF_US): New.
        (VCVTH_US): New.
        (FP16_RND): New.
        (absneg_str): New.
        (FCVTI32typename): Moved from vfp.md.
        (sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
        UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
        UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
        UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UUNSPEC_VCVTH_S and
        UNSPEC_VCVTH_U.
        (vcvth_op): New.
        (fp16_rnd_str): New.
        (fp16_rnd_insn): New.
        * config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
        (UNSPEC_VCVT_HF_U_N): New.
        (UNSPEC_VCVT_SI_S_N): New.
        (UNSPEC_VCVT_SI_U_N): New.
        (UNSPEC_VCVTH_S): New.
        (UNSPEC_VCVTH_U): New.
        (UNSPEC_VCVTA_S): New.
        (UNSPEC_VCVTA_U): New.
        (UNSPEC_VCVTM_S): New.
        (UNSPEC_VCVTM_U): New.
        (UNSPEC_VCVTN_S): New.
        (UNSPEC_VCVTN_U): New.
        (UNSPEC_VCVTP_S): New.
        (UNSPEC_VCVTP_U): New.
        (UNSPEC_VCVTP_S): New.
        (UNSPEC_VCVTP_U): New.
        (UNSPEC_VRND): New.
        (UNSPEC_VRNDA): New.
        (UNSPEC_VRNDI): New.
        (UNSPEC_VRNDM): New.
        (UNSPEC_VRNDN): New.
        (UNSPEC_VRNDP): New.
        (UNSPEC_VRNDX): New.
        * config/arm/vfp.md (<absneg_str>hf2): New.
        (neon_v<absneg_str>hf): New.
        (neon_v<fp16_rnd_str>hf): New.
        (neon_vrndihf): New.
        (addhf3_fp16): New.
        (neon_vaddhf): New.
        (subhf3_fp16): New.
        (neon_vsubhf): New.
        (divhf3_fp16): New.
        (neon_vdivhf): New.
        (mulhf3_fp16): New.
        (neon_vmulhf): New.
        (*mulsf3neghf_vfp): New.
        (*negmulhf3_vfp): New.
        (*mulsf3addhf_vfp): New.
        (*mulhf3subhf_vfp): New.
        (*mulhf3neghfaddhf_vfp): New.
        (*mulhf3neghfsubhf_vfp): New.
        (fmahf4_fp16): New.
        (neon_vfmahf): New.
        (fmsubhf4_fp16): New.
        (neon_vfmshf): New.
        (*fnmsubhf4): New.
        (*fnmaddhf4): New.
        (neon_vsqrthf): New.
        (neon_vrsqrtshf): New.
        (FCVT): Move to iterators.md.
        (FCVTI32typename): Likewise.
        (neon_vcvth<sup>hf): New.
        (neon_vcvth<sup>si): New.
        (neon_vcvth<sup>_nhf_unspec): New.
        (neon_vcvth<sup>_nhf): New.
        (neon_vcvth<sup>_nsi_unspec): New.
        (neon_vcvth<sup>_nsi): New.
        (neon_vcvt<vcvth_op>h<sup>si): New.
        (neon_<fmaxmin_op>hf): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wa...@arm.com>

        * gcc.target/arm/armv8_2-fp16-arith-1.c: New.
        * gcc.target/arm/armv8_2-fp16-conv-1.c: New.

>From 3e773f2ec85ea66d0be0e3a97ea52826156c00f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab <matthew.wa...@arm.com>
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17  Matthew Wahab  <matthew.wa...@arm.com>

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VRND): New.
	(UNSPEC_VRNDA): New.
	(UNSPEC_VRNDI): New.
	(UNSPEC_VRNDM): New.
	(UNSPEC_VRNDN): New.
	(UNSPEC_VRNDP): New.
	(UNSPEC_VRNDX): New.
	* config/arm/vfp.md (<absneg_str>hf2): New.
	(neon_v<absneg_str>hf): New.
	(neon_v<fp16_rnd_str>hf): New.
	(neon_vrndihf): New.
	(addhf3_fp16): New.
	(neon_vaddhf): New.
	(subhf3_fp16): New.
	(neon_vsubhf): New.
	(divhf3_fp16): New.
	(neon_vdivhf): New.
	(mulhf3_fp16): New.
	(neon_vmulhf): New.
	(*mulsf3neghf_vfp): New.
	(*negmulhf3_vfp): New.
	(*mulsf3addhf_vfp): New.
	(*mulhf3subhf_vfp): New.
	(*mulhf3neghfaddhf_vfp): New.
	(*mulhf3neghfsubhf_vfp): New.
	(fmahf4_fp16): New.
	(neon_vfmahf): New.
	(fmsubhf4_fp16): New.
	(neon_vfmshf): New.
	(*fnmsubhf4): New.
	(*fnmaddhf4): New.
	(neon_vsqrthf): New.
	(neon_vrsqrtshf): New.
	(FCVT): Move to iterators.md.
	(FCVTI32typename): Likewise.
	(neon_vcvth<sup>hf): New.
	(neon_vcvth<sup>si): New.
	(neon_vcvth<sup>_nhf_unspec): New.
	(neon_vcvth<sup>_nhf): New.
	(neon_vcvth<sup>_nsi_unspec): New.
	(neon_vcvth<sup>_nsi): New.
	(neon_vcvt<vcvth_op>h<sup>si): New.
	(neon_<fmaxmin_op>hf): New.

testsuite/
2016-05-17  Matthew Wahab  <matthew.wa...@arm.com>

	* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
	* gcc.target/arm/armv8_2-fp16-conv-1.c: New.
---
 gcc/config/arm/iterators.md                        |  59 ++-
 gcc/config/arm/unspecs.md                          |  21 +
 gcc/config/arm/vfp.md                              | 423 ++++++++++++++++++++-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c          |  68 ++++
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c | 101 +++++
 5 files changed, 666 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3f9d9e4..9371b6a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -199,14 +199,17 @@
 ;; Code iterators
 ;;----------------------------------------------------------------------------
 
-;; A list of condition codes used in compare instructions where 
-;; the carry flag from the addition is used instead of doing the 
+;; A list of condition codes used in compare instructions where
+;; the carry flag from the addition is used instead of doing the
 ;; compare a second time.
 (define_code_iterator LTUGEU [ltu geu])
 
 ;; The signed gt, ge comparisons
 (define_code_iterator GTGE [gt ge])
 
+;; The signed gt, ge, lt, le comparisons
+(define_code_iterator GLTE [gt ge lt le])
+
 ;; The unsigned gt, ge comparisons
 (define_code_iterator GTUGEU [gtu geu])
 
@@ -235,6 +238,12 @@
 ;; Binary operators whose second operand can be shifted.
 (define_code_iterator SHIFTABLE_OPS [plus minus ior xor and])
 
+;; Operations on the sign of a number.
+(define_code_iterator ABSNEG [abs neg])
+
+;; Conversions.
+(define_code_iterator FCVT [unsigned_float float])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer opoerand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -330,6 +339,22 @@
 
 (define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N])
 
+(define_int_iterator VCVT_HF_US_N [UNSPEC_VCVT_HF_S_N UNSPEC_VCVT_HF_U_N])
+
+(define_int_iterator VCVT_SI_US_N [UNSPEC_VCVT_SI_S_N UNSPEC_VCVT_SI_U_N])
+
+(define_int_iterator VCVT_HF_US [UNSPEC_VCVTA_S UNSPEC_VCVTA_U
+				 UNSPEC_VCVTM_S UNSPEC_VCVTM_U
+				 UNSPEC_VCVTN_S UNSPEC_VCVTN_U
+				 UNSPEC_VCVTP_S UNSPEC_VCVTP_U])
+
+(define_int_iterator VCVTH_US [UNSPEC_VCVTH_S UNSPEC_VCVTH_U])
+
+;; Operators for FP16 instructions.
+(define_int_iterator FP16_RND [UNSPEC_VRND UNSPEC_VRNDA
+			       UNSPEC_VRNDM UNSPEC_VRNDN
+			       UNSPEC_VRNDP UNSPEC_VRNDX])
+
 (define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U])
 
 (define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U])
@@ -687,6 +712,12 @@
 (define_code_attr shift [(ashiftrt "ashr") (lshiftrt "lshr")])
 (define_code_attr shifttype [(ashiftrt "signed") (lshiftrt "unsigned")])
 
+;; String reprentations of operations on the sign of a number.
+(define_code_attr absneg_str [(abs "abs") (neg "neg")])
+
+;; Conversions.
+(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
+
 ;;----------------------------------------------------------------------------
 ;; Int attributes
 ;;----------------------------------------------------------------------------
@@ -718,7 +749,13 @@
   (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u")
   (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u")
   (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u")
+  (UNSPEC_VCVTA_S "s") (UNSPEC_VCVTA_U "u")
+  (UNSPEC_VCVTM_S "s") (UNSPEC_VCVTM_U "u")
+  (UNSPEC_VCVTN_S "s") (UNSPEC_VCVTN_U "u")
+  (UNSPEC_VCVTP_S "s") (UNSPEC_VCVTP_U "u")
   (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u")
+  (UNSPEC_VCVT_HF_S_N "s") (UNSPEC_VCVT_HF_U_N "u")
+  (UNSPEC_VCVT_SI_S_N "s") (UNSPEC_VCVT_SI_U_N "u")
   (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u")
   (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u")
   (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u")
@@ -733,9 +770,25 @@
   (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u")
   (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
-
+  (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
 ])
 
+(define_int_attr vcvth_op
+ [(UNSPEC_VCVTA_S "a") (UNSPEC_VCVTA_U "a")
+  (UNSPEC_VCVTM_S "m") (UNSPEC_VCVTM_U "m")
+  (UNSPEC_VCVTN_S "n") (UNSPEC_VCVTN_U "n")
+  (UNSPEC_VCVTP_S "p") (UNSPEC_VCVTP_U "p")])
+
+(define_int_attr fp16_rnd_str
+  [(UNSPEC_VRND "rnd") (UNSPEC_VRNDA "rnda")
+   (UNSPEC_VRNDM "rndm") (UNSPEC_VRNDN "rndn")
+   (UNSPEC_VRNDP "rndp") (UNSPEC_VRNDX "rndx")])
+
+(define_int_attr fp16_rnd_insn
+  [(UNSPEC_VRND "vrintz") (UNSPEC_VRNDA "vrinta")
+   (UNSPEC_VRNDM "vrintm") (UNSPEC_VRNDN "vrintn")
+   (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
+
 (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
                               (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
                               (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 5744c62..57a47ff 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -203,6 +203,20 @@
   UNSPEC_VCVT_U
   UNSPEC_VCVT_S_N
   UNSPEC_VCVT_U_N
+  UNSPEC_VCVT_HF_S_N
+  UNSPEC_VCVT_HF_U_N
+  UNSPEC_VCVT_SI_S_N
+  UNSPEC_VCVT_SI_U_N
+  UNSPEC_VCVTH_S
+  UNSPEC_VCVTH_U
+  UNSPEC_VCVTA_S
+  UNSPEC_VCVTA_U
+  UNSPEC_VCVTM_S
+  UNSPEC_VCVTM_U
+  UNSPEC_VCVTN_S
+  UNSPEC_VCVTN_U
+  UNSPEC_VCVTP_S
+  UNSPEC_VCVTP_U
   UNSPEC_VEXT
   UNSPEC_VHADD_S
   UNSPEC_VHADD_U
@@ -365,5 +379,12 @@
   UNSPEC_NVRINTN
   UNSPEC_VQRDMLAH
   UNSPEC_VQRDMLSH
+  UNSPEC_VRND
+  UNSPEC_VRNDA
+  UNSPEC_VRNDI
+  UNSPEC_VRNDM
+  UNSPEC_VRNDN
+  UNSPEC_VRNDP
+  UNSPEC_VRNDX
 ])
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index b1c13fa..6202fc3 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -937,9 +937,73 @@
    (set_attr "type" "ffarithd")]
 )
 
+;; ABS and NEG for FP16.
+(define_insn "<absneg_str>hf2"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (ABSNEG:HF (match_operand:HF 1 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "v<absneg_str>.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffariths")]
+)
+
+(define_expand "neon_v<absneg_str>hf"
+ [(set
+   (match_operand:HF 0 "s_register_operand")
+   (ABSNEG:HF (match_operand:HF 1 "s_register_operand")))]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_<absneg_str>hf2 (operands[0], operands[1]));
+  DONE;
+})
+
+;; VRND for FP16.
+(define_insn "neon_v<fp16_rnd_str>hf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     FP16_RND))]
+ "TARGET_VFP_FP16INST"
+ "<fp16_rnd_insn>.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
+
+(define_insn "neon_vrndihf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF
+     [(match_operand:HF 1 "s_register_operand" "w")]
+     UNSPEC_VRNDI))]
+  "TARGET_VFP_FP16INST"
+  "vrintr.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "neon_fp_round_s")]
+)
 
 ;; Arithmetic insns
 
+(define_insn "addhf3_fp16"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (plus:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vadd.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
+
+(define_expand "neon_vaddhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_addhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "*addsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(plus:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -962,6 +1026,28 @@
    (set_attr "type" "faddd")]
 )
 
+(define_insn "subhf3_fp16"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (minus:HF
+    (match_operand:HF 1 "s_register_operand" "w")
+    (match_operand:HF 2 "s_register_operand" "w")))]
+ "TARGET_VFP_FP16INST"
+ "vsub.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fadds")]
+)
+
+(define_expand "neon_vsubhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_subhf3_fp16 (operands[0], operands[1],
+			      operands[2]));
+  DONE;
+})
 
 (define_insn "*subsf3_vfp"
   [(set (match_operand:SF	    0 "s_register_operand" "=t")
@@ -988,6 +1074,29 @@
 
 ;; Division insns
 
+;; FP16 Division.
+(define_insn "divhf3_fp16"
+  [(set
+    (match_operand:HF	   0 "s_register_operand" "=w")
+    (div:HF
+     (match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vdiv.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fdivs")]
+)
+
+(define_expand "neon_vdivhf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_divhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1018,6 +1127,27 @@
 
 ;; Multiplication insns
 
+(define_insn "mulhf3_fp16"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (mult:HF (match_operand:HF 1 "s_register_operand" "w")
+	    (match_operand:HF 2 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vmul.f16\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
+(define_expand "neon_vmulfhf"
+ [(match_operand:HF 0 "s_register_operand")
+  (match_operand:HF 1 "s_register_operand")
+  (match_operand:HF 2 "s_register_operand")]
+ "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_mulhf3_fp16 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 (define_insn "*mulsf3_vfp"
   [(set (match_operand:SF	   0 "s_register_operand" "=t")
 	(mult:SF (match_operand:SF 1 "s_register_operand" "t")
@@ -1040,6 +1170,26 @@
    (set_attr "type" "fmuld")]
 )
 
+(define_insn "*mulsf3neghf_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(mult:HF (neg:HF (match_operand:HF 1 "s_register_operand" "t"))
+		 (match_operand:HF	   2 "s_register_operand" "t")))]
+  "TARGET_VFP_FP16INST && !flag_rounding_math"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
+(define_insn "*negmulhf3_vfp"
+  [(set (match_operand:HF		   0 "s_register_operand" "=t")
+	(neg:HF (mult:HF (match_operand:HF 1 "s_register_operand" "t")
+		 (match_operand:HF	   2 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vnmul.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmuls")]
+)
+
 (define_insn "*mulsf3negsf_vfp"
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t"))
@@ -1089,6 +1239,18 @@
 ;; Multiply-accumulate insns
 
 ;; 0 = 1 * 2 + 0
+(define_insn "*mulsf3addhf_vfp"
+ [(set (match_operand:HF 0 "s_register_operand" "=t")
+       (plus:HF
+	(mult:HF (match_operand:HF 2 "s_register_operand" "t")
+		 (match_operand:HF 3 "s_register_operand" "t"))
+	(match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3addsf_vfp"
   [(set (match_operand:SF		    0 "s_register_operand" "=t")
 	(plus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1114,6 +1276,17 @@
 )
 
 ;; 0 = 1 * 2 - 0
+(define_insn "*mulhf3subhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3subsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
@@ -1139,6 +1312,17 @@
 )
 
 ;; 0 = -(1 * 2) + 0
+(define_insn "*mulhf3neghfaddhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (match_operand:HF 1 "s_register_operand" "0")
+		  (mult:HF (match_operand:HF 2 "s_register_operand" "t")
+			   (match_operand:HF 3 "s_register_operand" "t"))))]
+  "TARGET_VFP_FP16INST"
+  "vmls.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfaddsf_vfp"
   [(set (match_operand:SF		     0 "s_register_operand" "=t")
 	(minus:SF (match_operand:SF	     1 "s_register_operand" "0")
@@ -1165,6 +1349,18 @@
 
 
 ;; 0 = -(1 * 2) - 0
+(define_insn "*mulhf3neghfsubhf_vfp"
+  [(set (match_operand:HF 0 "s_register_operand" "=t")
+	(minus:HF (mult:HF
+		   (neg:HF (match_operand:HF 2 "s_register_operand" "t"))
+		   (match_operand:HF 3 "s_register_operand" "t"))
+		  (match_operand:HF 1 "s_register_operand" "0")))]
+  "TARGET_VFP_FP16INST"
+  "vnmla.f16\\t%0, %2, %3"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fmacs")]
+)
+
 (define_insn "*mulsf3negsfsubsf_vfp"
   [(set (match_operand:SF		      0 "s_register_operand" "=t")
 	(minus:SF (mult:SF
@@ -1193,6 +1389,30 @@
 
 ;; Fused-multiply-accumulate
 
+(define_insn "fmahf4_fp16"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+    (fma:HF
+     (match_operand:HF 1 "register_operand" "w")
+     (match_operand:HF 2 "register_operand" "w")
+     (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfma.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmahf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmahf4_fp16 (operands[0], operands[2], operands[3],
+			      operands[1]));
+  DONE;
+})
+
 (define_insn "fma<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
         (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1205,6 +1425,30 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "fmsubhf4_fp16"
+ [(set (match_operand:HF 0 "register_operand" "=w")
+   (fma:HF
+    (neg:HF (match_operand:HF 1 "register_operand" "w"))
+    (match_operand:HF 2 "register_operand" "w")
+    (match_operand:HF 3 "register_operand" "0")))]
+ "TARGET_VFP_FP16INST"
+ "vfms.f16\\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "ffmas")]
+)
+
+(define_expand "neon_vfmshf"
+  [(match_operand:HF 0 "s_register_operand")
+   (match_operand:HF 1 "s_register_operand")
+   (match_operand:HF 2 "s_register_operand")
+   (match_operand:HF 3 "s_register_operand")]
+  "TARGET_VFP_FP16INST"
+{
+  emit_insn (gen_fmsubhf4_fp16 (operands[0], operands[2], operands[3],
+				operands[1]));
+  DONE;
+})
+
 (define_insn "*fmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1218,6 +1462,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmsubhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (match_operand:HF 1 "register_operand" "w")
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnms.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmsub<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
@@ -1230,6 +1485,17 @@
    (set_attr "type" "ffma<vfp_type>")]
 )
 
+(define_insn "*fnmaddhf4"
+  [(set (match_operand:HF 0 "register_operand" "=w")
+	(fma:HF (neg:HF (match_operand:HF 1 "register_operand" "w"))
+		 (match_operand:HF 2 "register_operand" "w")
+		 (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
+  "TARGET_VFP_FP16INST"
+  "vfnma.f16\\t%0, %1, %2"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "ffmas")]
+)
+
 (define_insn "*fnmadd<SDF:mode>4"
   [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
 	(fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
@@ -1372,6 +1638,27 @@
 
 ;; Sqrt insns.
 
+(define_insn "neon_vsqrthf"
+  [(set (match_operand:HF 0 "s_register_operand" "=w")
+	(sqrt:HF (match_operand:HF 1 "s_register_operand" "w")))]
+  "TARGET_VFP_FP16INST"
+  "vsqrt.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "fsqrts")]
+)
+
+(define_insn "neon_vrsqrtshf"
+  [(set
+    (match_operand:HF 0 "s_register_operand" "=w")
+    (unspec:HF [(match_operand:HF 1 "s_register_operand" "w")
+		(match_operand:HF 2 "s_register_operand" "w")]
+     UNSPEC_VRSQRTS))]
+ "TARGET_VFP_FP16INST"
+ "vrsqrts.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "fsqrts")]
+)
+
 ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
 ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
 ; earlier.
@@ -1528,9 +1815,6 @@
 )
 
 ;; Fixed point to floating point conversions.
-(define_code_iterator FCVT [unsigned_float float])
-(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
-
 (define_insn "*combine_vcvt_f32_<FCVTI32typename>"
   [(set (match_operand:SF 0 "s_register_operand" "=t")
 	(mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0"))
@@ -1575,6 +1859,125 @@
    (set_attr "type" "f_cvtf2i")]
  )
 
+;; FP16 conversions.
+(define_insn "neon_vcvth<sup>hf"
+ [(set (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:SI 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.f16.<sup>%#32\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvti2f")]
+)
+
+(define_insn "neon_vcvth<sup>si"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVTH_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt.<sup>%#32.f16\t%0, %1"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_cvtf2i")]
+)
+
+;; The neon_vcvth<sup>_nhf patterns are used to generate the instruction for the
+;; vcvth_n_f16_<sup>32 arm_fp16 intrinsics.  They are complicated by the
+;; hardware requirement that the source and destination registers are the same
+;; despite having different machine modes.  The approach is to use a temporary
+;; register for the conversion and move that to the correct destination.
+
+;; Generate an unspec pattern for the intrinsic.
+(define_insn "neon_vcvth<sup>_nhf_unspec"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:SI 1 "s_register_operand" "0")
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_HF_US_N))
+ (set
+  (match_operand:HF 3 "s_register_operand" "=w")
+  (float_truncate:HF (float:SF (match_dup 0))))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vcvt.f16.<sup>32\t%0, %0, %2\;vmov.f32\t%3, %0";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvti2f")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nhf"
+ [(match_operand:HF 0 "s_register_operand")
+  (unspec:HF [(match_operand:SI 1 "s_register_operand")
+	      (match_operand:SI 2 "immediate_operand")]
+   VCVT_HF_US_N)]
+"TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+
+  emit_move_insn (op1, operands[1]);
+  emit_insn (gen_neon_vcvth<sup>_nhf_unspec (op1, op1, operands[2],
+					     operands[0]));
+  DONE;
+})
+
+;; The neon_vcvth<sup>_nsi patterns are used to generate the instruction for the
+;; vcvth_n_<sup>32_f16 arm_fp16 intrinsics.  They have the same restrictions and
+;; are implemented in the same way as the neon_vcvth<sup>_nhf patterns.
+
+;; Generate an unspec pattern, constraining the registers.
+(define_insn "neon_vcvth<sup>_nsi_unspec"
+ [(set (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(fix:SI
+      (fix:SF
+       (float_extend:SF
+	(match_operand:HF 1 "s_register_operand" "w"))))
+     (match_operand:SI 2 "immediate_operand" "i")]
+    VCVT_SI_US_N))]
+ "TARGET_VFP_FP16INST"
+{
+  neon_const_bounds (operands[2], 1, 33);
+  return "vmov.f32\t%0, %1\;vcvt.<sup>%#32.f16\t%0, %0, %2";
+}
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
+;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
+(define_expand "neon_vcvth<sup>_nsi"
+ [(match_operand:SI 0 "s_register_operand")
+  (unspec:SI
+   [(match_operand:HF 1 "s_register_operand")
+    (match_operand:SI 2 "immediate_operand")]
+   VCVT_SI_US_N)]
+ "TARGET_VFP_FP16INST"
+{
+  rtx op1 = gen_reg_rtx (SImode);
+
+  neon_const_bounds (operands[2], 1, 33);
+  emit_insn (gen_neon_vcvth<sup>_nsi_unspec (op1, operands[1], operands[2]));
+  emit_move_insn (operands[0], op1);
+  DONE;
+})
+
+(define_insn "neon_vcvt<vcvth_op>h<sup>si"
+ [(set
+   (match_operand:SI 0 "s_register_operand" "=w")
+   (unspec:SI
+    [(match_operand:HF 1 "s_register_operand" "w")]
+    VCVT_HF_US))]
+ "TARGET_VFP_FP16INST"
+ "vcvt<vcvth_op>.<sup>%#32.f16\t%0, %1"
+  [(set_attr "conds" "unconditional")
+   (set_attr "type" "f_cvtf2i")]
+)
+
 ;; Store multiple insn used in function prologue.
 (define_insn "*push_multi_vfp"
   [(match_parallel 2 "multi_register_push"
@@ -1644,6 +2047,20 @@
 )
 
 ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
+
+(define_insn "neon_<fmaxmin_op>hf"
+ [(set
+   (match_operand:HF 0 "s_register_operand" "=w")
+   (unspec:HF
+    [(match_operand:HF 1 "s_register_operand" "w")
+     (match_operand:HF 2 "s_register_operand" "w")]
+    VMAXMINFNM))]
+ "TARGET_VFP_FP16INST"
+ "<fmaxmin_op>.f16\t%0, %1, %2"
+ [(set_attr "conds" "unconditional")
+  (set_attr "type" "f_minmaxs")]
+)
+
 (define_insn "<fmaxmin><mode>3"
   [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
 	(unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
new file mode 100644
index 0000000..8399288
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -ffast-math" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test instructions generated for half-precision arithmetic.  */
+
+typedef __fp16 float16_t;
+typedef __simd64_float16_t float16x4_t;
+typedef __simd128_float16_t float16x8_t;
+
+float16_t
+fp16_abs (float16_t a)
+{
+  return (a < 0) ? -a : a;
+}
+
+#define TEST_UNOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a)			\
+  {						\
+    return OPERATOR (a);			\
+  }
+
+#define TEST_BINOP(NAME, OPERATOR, TY)		\
+  TY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+#define TEST_CMP(NAME, OPERATOR, RTY, TY)	\
+  RTY test_##NAME##_##TY (TY a, TY b)		\
+  {						\
+    return a OPERATOR b;			\
+  }
+
+/* Scalars.  */
+
+TEST_UNOP (neg, -, float16_t)
+TEST_UNOP (abs, fp16_abs, float16_t)
+
+TEST_BINOP (add, +, float16_t)
+TEST_BINOP (sub, -, float16_t)
+TEST_BINOP (mult, *, float16_t)
+TEST_BINOP (div, /, float16_t)
+
+TEST_CMP (equal, ==, int, float16_t)
+TEST_CMP (unequal, !=, int, float16_t)
+TEST_CMP (lessthan, <, int, float16_t)
+TEST_CMP (greaterthan, >, int, float16_t)
+TEST_CMP (lessthanequal, <=, int, float16_t)
+TEST_CMP (greaterthanqual, >=, int, float16_t)
+
+/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+/* { dg-final { scan-assembler-not {vadd\.f16} } }  */
+/* { dg-final { scan-assembler-not {vsub\.f16} } }  */
+/* { dg-final { scan-assembler-not {vmul\.f16} } }  */
+/* { dg-final { scan-assembler-not {vdiv\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmp\.f16} } }  */
+/* { dg-final { scan-assembler-not {vcmpe\.f16} } }  */
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
new file mode 100644
index 0000000..c9639a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+/* Test ARMv8.2 FP16 conversions.  */
+#include <arm_fp16.h>
+
+float
+f16_to_f32 (__fp16 a)
+{
+  return (float)a;
+}
+
+float
+f16_to_pf32 (__fp16* a)
+{
+  return (float)*a;
+}
+
+short
+f16_to_s16 (__fp16 a)
+{
+  return (short)a;
+}
+
+short
+pf16_to_s16 (__fp16* a)
+{
+  return (short)*a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } }  */
+
+__fp16
+f32_to_f16 (float a)
+{
+  return (__fp16)a;
+}
+
+void
+f32_to_pf16 (__fp16* x, float a)
+{
+  *x = (__fp16)a;
+}
+
+__fp16
+s16_to_f16 (short a)
+{
+  return (__fp16)a;
+}
+
+void
+s16_to_pf16 (__fp16* x, short a)
+{
+  *x = (__fp16)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } }  */
+
+float
+s16_to_f32 (short a)
+{
+  return (float)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+short
+f32_to_s16 (float a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } }  */
+
+unsigned short
+f32_to_u16 (float a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } }  */
+
+short
+f64_to_s16 (double a)
+{
+  return (short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+unsigned short
+f64_to_u16 (double a)
+{
+  return (unsigned short)a;
+}
+
+/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } }  */
+
+
-- 
2.1.4

[PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

Reply via email to