Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

Dennis Zhang via Gcc-patches Tue, 06 Oct 2020 09:47:25 -0700

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  <dennis.zh...@arm.com>
> 
>       * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>       * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>       (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>       (TARGET_NEON_MVE_HFP): Likewise.
>       * config/arm/iterators.md (VSEL): New mode iterator to select modes
>       for corresponding targets.
>       * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
>       using expression 'minus'.
>       (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
>       * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
>       sub<mode>3 in vec-common.md
>       * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
>       to select available modes. Exclude TARGET_NEON_FP16INST from
>       TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>       originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  <dennis.zh...@arm.com>
> 
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>       option -fno-ipa-icf and change the instruction count from 8 to 16.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>       * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>       * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>       * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
>


This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub<mode>3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub<mode>3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zh...@arm.com>

        * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
        using expression 'minus'.
        (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
        * config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
        ARM_HAVE_<MODE>_ARITH.
        (sub<mode>3, sub<mode>3_fp16): Removed.
        (neon_vsub<mode>): Use gen_sub<mode>3 instead of gen_sub<mode>3_fp16.
        * config/arm/vec-common.md (sub<mode>3): Use the new mode macros
        ARM_HAVE_<MODE>_ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zh...@arm.com>

        * gcc.target/arm/simd/mve-vsub_1.c: New test.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		     (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vabdq_f])
 ;;
@@ -3480,9 +3491,8 @@
 (define_insn "mve_vsubq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VSUBQ_F))
+	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		     (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..9799c130875 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -513,7 +513,7 @@
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                    (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -521,28 +521,6 @@
                     (const_string "neon_sub<q>")))]
 )
 
-(define_insn "sub<mode>3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
-(define_insn "sub<mode>3_fp16"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
 (define_insn "*mul<mode>3_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (mult:VDQW (match_operand:VDQW 1 "s_register_operand" "w")
@@ -1804,7 +1782,7 @@
    (match_operand:VH 2 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_sub<mode>3 (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..5f5668bcf9b 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -87,18 +87,12 @@
   "ARM_HAVE_<MODE>_ARITH"
 )
 
-;; Vector arithmetic. Expanders are blank, then unnamed insns implement
-;; patterns separately for IWMMXT and Neon.
-
 (define_expand "sub<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (minus:VALL (match_operand:VALL 1 "s_register_operand")
-                    (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+  [(set (match_operand:VDQ 0 "s_register_operand")
+        (minus:VDQ (match_operand:VDQ 1 "s_register_operand")
+                   (match_operand:VDQ 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "mul<mode>3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
new file mode 100644
index 00000000000..cb3ef3a14e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_f32 (float * dest, float * a, float * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

Reply via email to