Re: [ARM] Use vector wide add for mixed-mode adds

Kyrill Tkachov Thu, 10 Dec 2015 07:10:27 -0800

Hi Michael,

A few comments while I look deeper into this patch...


On 30/11/15 01:18, Michael Collison wrote:


This is a modified version of my previous patch that supports vector wide add. 
I added support for vaddw on big endian when generating the parallel operand 
for the vector select.

There are four failing test cases on arm big endian with similar code. They are:

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


The failures occur without my patch and are related to a bug with vector loads 
using VUZP operations.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, 
and armeb-none-linux-gnueabihf.

2015-11-29  Michael Collison  <michael.colli...@linaro.org>

    * config/arm/neon.md (widen_<us>sum<mode>): New patterns where
    mode is VQI to improve mixed mode vectorization.
    * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
    define_insn to match low half of signed vaddw.
    * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
    define_insn to match high half of signed vaddw.
    * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
    define_insn to match low half of unsigned vaddw.
    * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
    define_insn to match high half of unsigned vaddw.
    * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
    (aarch32_simd_check_vect_par_cnst_half): Likewise.
    * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype
    for new function.
    (aarch32_simd_check_vect_par_cnst_half): Likewise.
    * config/arm/predicates.md (vect_par_constant_high): Support
    big endian and simplify by calling
    aarch32_simd_check_vect_par_cnst_half
    (vect_par_constant_low): Likewise.
    * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
    * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
    * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
    * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
    * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
    * testsuite/lib/target-supports.exp
    (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
    that arm neon support vector widen sum of HImode TO SImode.

Okay for trunk?


--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
                              ATTRIBUTE_UNUSED);
 extern void arm_init_builtins (void);
 extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree 
*update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+                                                  bool high);


Please use arm instead of aarch32 in the name to be consistent with the rest of 
the
backend. Also, for functions that return a bool without side-effects it's 
preferable
to finish their name with '_p'. So for the second one I'd drop the 'check' and 
call
it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name?

+/* Check OP for validity as a PARALLEL RTX vector with elements
+   numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+   from the perspective of the architecture.  See the diagram above
+   aarch64_simd_vect_par_cnst_half for more details.  */
+

aarch64?

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,51 @@

;; Widening operations+(define_expand "widen_ssum<mode>3"

+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+       (plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 
"s_register_operand" ""))
+                              (match_operand:<V_double_width> 2 "s_register_operand" 
"")))]
+  "TARGET_NEON"
+  {
+    machine_mode mode = GET_MODE (operands[1]);
+    rtx p1, p2;
+
+    p1  = aarch32_simd_vect_par_cnst_half (mode, false);
+    p2  = aarch32_simd_vect_par_cnst_half (mode, true);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], 
operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], 
operands[1], p2, operands[0]));
+    DONE;
+  }

Please format these properly to avoid long lines.
Thanks,
Kyrill

Re: [ARM] Use vector wide add for mixed-mode adds

Reply via email to