Hi Michael,
A few comments while I look deeper into this patch...
On 30/11/15 01:18, Michael Collison wrote:
This is a modified version of my previous patch that supports vector wide add.
I added support for vaddw on big endian when generating the parallel operand
for the vector select.
There are four failing test cases on arm big endian with similar code. They are:
gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test
The failures occur without my patch and are related to a bug with vector loads
using VUZP operations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532
Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf,
and armeb-none-linux-gnueabihf.
2015-11-29 Michael Collison <michael.colli...@linaro.org>
* config/arm/neon.md (widen_<us>sum<mode>): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): Prototype
for new function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
aarch32_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.
Okay for trunk?
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
ATTRIBUTE_UNUSED);
extern void arm_init_builtins (void);
extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree
*update);
-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
+ bool high);
Please use arm instead of aarch32 in the name to be consistent with the rest of
the
backend. Also, for functions that return a bool without side-effects it's
preferable
to finish their name with '_p'. So for the second one I'd drop the 'check' and
call
it something like "arm_vector_of_lane_nums_p ", is that a more descriptive name?
+/* Check OP for validity as a PARALLEL RTX vector with elements
+ numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+ from the perspective of the architecture. See the diagram above
+ aarch64_simd_vect_par_cnst_half for more details. */
+
aarch64?
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,51 @@
;; Widening operations
+(define_expand "widen_ssum<mode>3"
+ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+ (plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1
"s_register_operand" ""))
+ (match_operand:<V_double_width> 2 "s_register_operand"
"")))]
+ "TARGET_NEON"
+ {
+ machine_mode mode = GET_MODE (operands[1]);
+ rtx p1, p2;
+
+ p1 = aarch32_simd_vect_par_cnst_half (mode, false);
+ p2 = aarch32_simd_vect_par_cnst_half (mode, true);
+
+ if (operands[0] != operands[2])
+ emit_move_insn (operands[0], operands[2]);
+
+ emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0],
operands[1], p1, operands[0]));
+ emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0],
operands[1], p2, operands[0]));
+ DONE;
+ }
Please format these properly to avoid long lines.
Thanks,
Kyrill