Hi Christophe,

On 26/11/2020 15:31, Christophe Lyon wrote:
Hi Andre,

Thanks for the quick feedback.

On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
<andre.simoesdiasvie...@arm.com> wrote:
Hi Christophe,

Thanks for these! See some inline comments.

On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
This patch enables MVE vandq instructions for auto-vectorization.  MVE
vandq insns in mve.md are modified to use 'and' instead of unspec
expression to support and<mode>3.  The and<mode>3 expander is added to
vec-common.md

2020-11-12  Christophe Lyon  <christophe.l...@linaro.org>

       gcc/
       * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
       (VANQ): Remove.
       * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
       instruction using expression and.
       (mve_vandq_s<mode>): New expander.
       * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
       * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
       * config/arm/vec-common.md (and<mode>3): New expander.

       gcc/testsuite/
       * gcc.target/arm/simd/mve-vand.c: New test.
---
   gcc/config/arm/iterators.md                  |  4 +---
   gcc/config/arm/mve.md                        | 20 ++++++++++++----
   gcc/config/arm/neon.md                       |  2 +-
   gcc/config/arm/unspecs.md                    |  2 --
   gcc/config/arm/vec-common.md                 | 15 ++++++++++++
   gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 
++++++++++++++++++++++++++++
   6 files changed, 66 insertions(+), 11 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 592af35..72039e4 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") 
(VREV16Q_S "s")
                      (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
                      (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
                      (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
-                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
-                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
+                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U 
"u")
                      (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
                      (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
                      (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
@@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
   (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
   (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
   (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
-(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
   (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
   (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
   (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ecbaaa9..975eb7d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
   ;;
   ;; [vandq_u, vandq_s])
   ;;
-(define_insn "mve_vandq_<supf><mode>"
+;; signed and unsigned versions are the same: define the unsigned
+;; insn, and use an expander for the signed one as we still reference
+;; both names from arm_mve.h.
+(define_insn "mve_vandq_u<mode>"
     [
      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-                    (match_operand:MVE_2 2 "s_register_operand" "w")]
-      VANDQ))
+     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+                    (match_operand:MVE_2 2 "s_register_operand" "w")))
The predicate on the second operand is more restrictive than the one in
expand 'neon_inv_logic_op2'. This should still work with immediates, or
well I checked for integers, it generates a loop as such:

Right, thanks for catching it.

          vldrw.32        q3, [r0]
          vldr.64 d4, .L8
          vldr.64 d5, .L8+8
          vand    q3, q3, q2
          vstrw.32        q3, [r2]

MVE does support vand's with immediates, just like NEON, I suspect you
could just copy the way Neon handles these, possibly worth the little
extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
If you don't it might still be worth expanding the test to make sure
other immediates-types combinations don't trigger an ICE?

I'm not sure I understand why it loads it in two 64-bit chunks and not
do a single load or not just do something like a vmov or vbic immediate.
Anyhow that's a worry for another day I guess..
Do you mean something like the attached (on top of this patch)?
I dislike the code duplication in mve_vandq_u<mode> which would
become a copy of and<mode>3_neon.
Hi Christophe,

Yeah that's what I meant. I agree with the code duplication. The reason we still use separate ones is because of the difference in supported modes. Maybe the right way around that would be to redefine VDQ as:

(define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI
                                                     (V4HI "TARGET_HAVE_NEON") V8HI                                                      (V2SI "TARGET_HAVE_NEON") V4SI                                                      (V4HF "TARGET_HAVE_NEON") V8HF                                                      (V2SF "TARGET_HAVE_NEON") V4SF V2DI])

And have a single define_expand and insn for both vector extensions. Though we would also need to do something about the type attribut in case we want to have different scheduling types for both. Though right now we don't do any scheduling for MVE, I don't know whether these can be conditionalized on target features though, something to look at.


The other concern is that it's not exercised by testcase: as you noted
the compiler uses a pair of loads to prepare the second operand.

But indeed that's probably a separate problem.

I guess your comments apply to patch 2 (vorr)?

Yeah, forgot to reply to that one, but yes! I still need to have a look at 4-7.

Kind regards,
Andre

Reply via email to