On Fri, 27 Nov 2020 at 15:13, Andre Vieira (lists)
<andre.simoesdiasvie...@arm.com> wrote:
>
> Hi Christophe,
>
> On 26/11/2020 15:31, Christophe Lyon wrote:
> > Hi Andre,
> >
> > Thanks for the quick feedback.
> >
> > On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
> > <andre.simoesdiasvie...@arm.com> wrote:
> >> Hi Christophe,
> >>
> >> Thanks for these! See some inline comments.
> >>
> >> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> >>> This patch enables MVE vandq instructions for auto-vectorization.  MVE
> >>> vandq insns in mve.md are modified to use 'and' instead of unspec
> >>> expression to support and<mode>3.  The and<mode>3 expander is added to
> >>> vec-common.md
> >>>
> >>> 2020-11-12  Christophe Lyon  <christophe.l...@linaro.org>
> >>>
> >>>        gcc/
> >>>        * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
> >>>        (VANQ): Remove.
> >>>        * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
> >>>        instruction using expression and.
> >>>        (mve_vandq_s<mode>): New expander.
> >>>        * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
> >>>        * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
> >>>        * config/arm/vec-common.md (and<mode>3): New expander.
> >>>
> >>>        gcc/testsuite/
> >>>        * gcc.target/arm/simd/mve-vand.c: New test.
> >>> ---
> >>>    gcc/config/arm/iterators.md                  |  4 +---
> >>>    gcc/config/arm/mve.md                        | 20 ++++++++++++----
> >>>    gcc/config/arm/neon.md                       |  2 +-
> >>>    gcc/config/arm/unspecs.md                    |  2 --
> >>>    gcc/config/arm/vec-common.md                 | 15 ++++++++++++
> >>>    gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 
> >>> ++++++++++++++++++++++++++++
> >>>    6 files changed, 66 insertions(+), 11 deletions(-)
> >>>    create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> >>>
> >>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> >>> index 592af35..72039e4 100644
> >>> --- a/gcc/config/arm/iterators.md
> >>> +++ b/gcc/config/arm/iterators.md
> >>> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") 
> >>> (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> >>>                       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
> >>>                       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
> >>>                       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
> >>> -                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
> >>> -                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
> >>> +                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") 
> >>> (VBICQ_U "u")
> >>>                       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S 
> >>> "s")
> >>>                       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
> >>>                       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
> >>> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
> >>>    (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
> >>>    (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
> >>>    (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
> >>> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
> >>>    (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
> >>>    (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
> >>>    (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
> >>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> >>> index ecbaaa9..975eb7d 100644
> >>> --- a/gcc/config/arm/mve.md
> >>> +++ b/gcc/config/arm/mve.md
> >>> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
> >>>    ;;
> >>>    ;; [vandq_u, vandq_s])
> >>>    ;;
> >>> -(define_insn "mve_vandq_<supf><mode>"
> >>> +;; signed and unsigned versions are the same: define the unsigned
> >>> +;; insn, and use an expander for the signed one as we still reference
> >>> +;; both names from arm_mve.h.
> >>> +(define_insn "mve_vandq_u<mode>"
> >>>      [
> >>>       (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> >>> -     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> >>> -                    (match_operand:MVE_2 2 "s_register_operand" "w")]
> >>> -      VANDQ))
> >>> +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> >>> +                    (match_operand:MVE_2 2 "s_register_operand" "w")))
> >> The predicate on the second operand is more restrictive than the one in
> >> expand 'neon_inv_logic_op2'. This should still work with immediates, or
> >> well I checked for integers, it generates a loop as such:
> >>
> > Right, thanks for catching it.
> >
> >>           vldrw.32        q3, [r0]
> >>           vldr.64 d4, .L8
> >>           vldr.64 d5, .L8+8
> >>           vand    q3, q3, q2
> >>           vstrw.32        q3, [r2]
> >>
> >> MVE does support vand's with immediates, just like NEON, I suspect you
> >> could just copy the way Neon handles these, possibly worth the little
> >> extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
> >> If you don't it might still be worth expanding the test to make sure
> >> other immediates-types combinations don't trigger an ICE?
> >>
> >> I'm not sure I understand why it loads it in two 64-bit chunks and not
> >> do a single load or not just do something like a vmov or vbic immediate.
> >> Anyhow that's a worry for another day I guess..
> > Do you mean something like the attached (on top of this patch)?
> > I dislike the code duplication in mve_vandq_u<mode> which would
> > become a copy of and<mode>3_neon.
> Hi Christophe,
>
> Yeah that's what I meant. I agree with the code duplication. The reason
> we still use separate ones is because of the difference in supported
> modes. Maybe the right way around that would be to redefine VDQ as:
>
> (define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI
>                                                       (V4HI
> "TARGET_HAVE_NEON") V8HI
>                                                       (V2SI
> "TARGET_HAVE_NEON") V4SI
>                                                       (V4HF
> "TARGET_HAVE_NEON") V8HF
>                                                       (V2SF
> "TARGET_HAVE_NEON") V4SF V2DI])
>
> And have a single define_expand and insn for both vector extensions.

Indeed, I can try that.
I have also noticed the VNIM1 / VNINOTM1 pair.

> Though we would also need to do something about the type attribut in
> case we want to have different scheduling types for both. Though right
> now we don't do any scheduling for MVE, I don't know whether these can
> be conditionalized on target features though, something to look at.
>
> >
> > The other concern is that it's not exercised by testcase: as you noted
> > the compiler uses a pair of loads to prepare the second operand.
> >
> > But indeed that's probably a separate problem.
> >
> > I guess your comments apply to patch 2 (vorr)?
>
> Yeah, forgot to reply to that one, but yes! I still need to have a look
> at 4-7.

Ok thanks, I have a WIP fix for #7 (vmvn) anyway.

> Kind regards,
> Andre
>

Reply via email to