Re: [ARM] Use vector wide add for mixed-mode adds

Michael Collison Tue, 20 Oct 2015 00:54:54 -0700

Hi Kyrill,

Since your email I have done the following:

1. Added the ENDIAN_LANE_N to the define_expand patterns for big endiantargets. The big endian patches produced no change in the test results.I still have several execution failures with targeting big endian withlto enabled.

2. I diff'd the rtl dumps from a big endian compiler with lto enabledand disabled. I also examined the assembly language and there nodifferences except for the .ascii directives.

I want to ask a question about existing patterns in neon.md that utilizethe vec_select and all the lanes as my example does: Why are thefollowing pattern not matched if the target is big endian?



(define_insn "neon_vec_unpack<US>_lo_<mode>"
  [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
        (SE:<V_unpack> (vec_select:<V_HALF>
              (match_operand:VU 1 "register_operand" "w")
              (match_operand:VU 2 "vect_par_constant_low" ""))))]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl.<US><V_sz_elem> %q0, %e1"
  [(set_attr "type" "neon_shift_imm_long")]
)

(define_insn "neon_vec_unpack<US>_hi_<mode>"
  [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
        (SE:<V_unpack> (vec_select:<V_HALF>
              (match_operand:VU 1 "register_operand" "w")
              (match_operand:VU 2 "vect_par_constant_high" ""))))]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl.<US><V_sz_elem> %q0, %f1"
  [(set_attr "type" "neon_shift_imm_long")]

These patterns are similar to the new patterns I am adding and I amwondering if my patterns should exclude BYTES_BIG_ENDIAN?


On 10/08/2015 04:02 AM, Kyrill Tkachov wrote:

Hi Michael,

On 01/10/15 11:05, Michael Collison wrote:

Kyrill,

I have modified the patch to address your comments. I also modified
check_effective_target_vect_widen_sum_hi_to_si_pattern in
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode.
This resolved
several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have
four related execution failure
tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


We'd want to get to the bottom of these before committing.
Does codegen before and after the patch show anything?
When it comes to big-endian and NEON, the fiddly parts are
usually lane numbers. Do you need to select the proper lanes with
ENDIAN_LANE_N like Charles in his patch at:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html?

Thanks,
Kyrill


I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison <michael.colli...@linaro.org>

      * config/arm/neon.md (widen_<us>sum<mode>): New patterns
      where mode is VQI to improve mixed mode vectorization.

* config/arm/neon.md(vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New

      define_insn to match low half of signed vaddw.

* config/arm/neon.md(vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New

      define_insn to match high half of signed vaddw.

* config/arm/neon.md(vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New

      define_insn to match low half of unsigned vaddw.

* config/arm/neon.md(vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New

      define_insn to match high half of unsigned vaddw.
      * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
      * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
      * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
      * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
      * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
      * testsuite/lib/target-supports.exp
      (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
      that arm neon support vector widen sum of HImode TO SImode.


Note that the testsuite changes should have their own ChangeLog entry
with the paths there starting relative to gcc/testsuite/


On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not beingvectorized due

to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
     len = len & ~31;
     int result = 0;
     __asm volatile ("");
     for (int i = 0; i < len; i++)
       result += x[i];
     return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison <michael.colli...@linaro.org>

       * config/arm/neon.md (widen_<us>sum<mode>): New patterns
       where mode is VQI to improve mixed mode add vectorization.

Please list all the new define_expands and define_insns
in the changelog. Also, please add an ChangeLog entry for
the testsuite additions.

The approach looks ok to me with a few comments on some
parts of the patch itself.


+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+    (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
(match_operand:VQI 1 "s_register_operand" "%w")
+                           (match_operand:VQI 2
"vect_par_constant_high" "")))
+                (match_operand:<VW:V_widen> 3 "s_register_operand"
"0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)


This is a single instruction, and it has a length of 4, so no need to
override the length attribute.
Same with the other define_insns in this patch.


diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
new file mode 100644
index 0000000..ed10669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */

The arm_neon_hw check is usually used when you want to run the tests.
Since this is a compile-only tests you just need arm_neon_ok.

  +/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
+
+
+

Stray trailing newlines. Similar comments for the other testcases.

Thanks,
Kyrill


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

Re: [ARM] Use vector wide add for mixed-mode adds

Reply via email to