On Fri, 11 Jun 2021 at 11:38, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Christophe Lyon <christophe.l...@linaro.org> writes: > > On Tue, 8 Jun 2021 at 14:10, Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Christophe Lyon <christophe.l...@linaro.org> writes: > >> > This patch adds vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>, > >> > vec_pack_trunc_<mode> patterns for MVE. > >> > > >> > It does so by moving the unpack patterns from neon.md to > >> > vec-common.md, while adding them support for MVE. The pack expander is > >> > derived from the Neon one (which in turn is renamed into > >> > neon_quad_vec_pack_trunc_<mode>). > >> > > >> > The patch introduces mve_vec_pack_trunc_<mode> to avoid the need for a > >> > zero-initialized temporary, which is needed if the > >> > vec_pack_trunc_<mode> expander calls @mve_vmovn[bt]q_<supf><mode> > >> > instead. > >> > > >> > With this patch, we can now vectorize the 16 and 8-bit versions of > >> > vclz and vshl, although the generated code could still be improved. > >> > For test_clz_s16, we now generate > >> > vldrh.16 q3, [r1] > >> > vmovlb.s16 q2, q3 > >> > vmovlt.s16 q3, q3 > >> > vclz.i32 q2, q2 > >> > vclz.i32 q3, q3 > >> > vmovnb.i32 q1, q2 > >> > vmovnt.i32 q1, q3 > >> > vstrh.16 q1, [r0] > >> > which could be improved to > >> > vldrh.16 q3, [r1] > >> > vclz.i16 q1, q3 > >> > vstrh.16 q1, [r0] > >> > if we could avoid the need for unpack/pack steps. > >> > >> Yeah, there was a PR about fixing this for popcount. I guess the same > >> approach would apply here too. > >> > >> > For reference, clang-12 generates: > >> > vldrh.s32 q0, [r1] > >> > vldrh.s32 q1, [r1, #8] > >> > vclz.i32 q0, q0 > >> > vstrh.32 q0, [r0] > >> > vclz.i32 q0, q1 > >> > vstrh.32 q0, [r0, #8] > >> > > >> > 2021-06-03 Christophe Lyon <christophe.l...@linaro.org> > >> > > >> > gcc/ > >> > * config/arm/mve.md (mve_vmovltq_<supf><mode>): Prefix with '@'. > >> > (mve_vmovlbq_<supf><mode>): Likewise. > >> > (mve_vmovnbq_<supf><mode>): Likewise. > >> > (mve_vmovntq_<supf><mode>): Likewise. > >> > (@mve_vec_pack_trunc_<mode>): New pattern. > >> > * config/arm/neon.md (vec_unpack<US>_hi_<mode>): Move to > >> > vec-common.md. > >> > (vec_unpack<US>_lo_<mode>): Likewise. > >> > (vec_pack_trunc_<mode>): Rename to > >> > neon_quad_vec_pack_trunc_<mode>. > >> > * config/arm/vec-common.md (vec_unpack<US>_hi_<mode>): New > >> > pattern. > >> > (vec_unpack<US>_lo_<mode>): New. > >> > (vec_pack_trunc_<mode>): New. > >> > > >> > gcc/testsuite/ > >> > * gcc.target/arm/simd/mve-vclz.c: Update expected results. > >> > * gcc.target/arm/simd/mve-vshl.c: Likewise. > >> > --- > >> > gcc/config/arm/mve.md | 20 ++++- > >> > gcc/config/arm/neon.md | 39 +-------- > >> > gcc/config/arm/vec-common.md | 89 ++++++++++++++++++++ > >> > gcc/testsuite/gcc.target/arm/simd/mve-vclz.c | 7 +- > >> > gcc/testsuite/gcc.target/arm/simd/mve-vshl.c | 5 +- > >> > 5 files changed, 114 insertions(+), 46 deletions(-) > >> > > >> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > >> > index 99e46d0bc69..b18292c07d3 100644 > >> > --- a/gcc/config/arm/mve.md > >> > +++ b/gcc/config/arm/mve.md > >> > @@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_<supf><mode>" > >> > ;; > >> > ;; [vmovltq_u, vmovltq_s]) > >> > ;; > >> > -(define_insn "mve_vmovltq_<supf><mode>" > >> > +(define_insn "@mve_vmovltq_<supf><mode>" > >> > [ > >> > (set (match_operand:<V_double_width> 0 "s_register_operand" "=w") > >> > (unspec:<V_double_width> [(match_operand:MVE_3 1 > >> > "s_register_operand" "w")] > >> > @@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_<supf><mode>" > >> > ;; > >> > ;; [vmovlbq_s, vmovlbq_u]) > >> > ;; > >> > -(define_insn "mve_vmovlbq_<supf><mode>" > >> > +(define_insn "@mve_vmovlbq_<supf><mode>" > >> > [ > >> > (set (match_operand:<V_double_width> 0 "s_register_operand" "=w") > >> > (unspec:<V_double_width> [(match_operand:MVE_3 1 > >> > "s_register_operand" "w")] > >> > @@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s<mode>" > >> > ;; > >> > ;; [vmovnbq_u, vmovnbq_s]) > >> > ;; > >> > -(define_insn "mve_vmovnbq_<supf><mode>" > >> > +(define_insn "@mve_vmovnbq_<supf><mode>" > >> > [ > >> > (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w") > >> > (unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 > >> > "s_register_operand" "0") > >> > @@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_<supf><mode>" > >> > ;; > >> > ;; [vmovntq_s, vmovntq_u]) > >> > ;; > >> > -(define_insn "mve_vmovntq_<supf><mode>" > >> > +(define_insn "@mve_vmovntq_<supf><mode>" > >> > [ > >> > (set (match_operand:<V_narrow_pack> 0 "s_register_operand" "=w") > >> > (unspec:<V_narrow_pack> [(match_operand:<V_narrow_pack> 1 > >> > "s_register_operand" "0") > >> > @@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_<supf><mode>" > >> > [(set_attr "type" "mve_move") > >> > ]) > >> > > >> > +(define_insn "@mve_vec_pack_trunc_<mode>" > >> > + [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w") > >> > + (vec_concat:<V_narrow_pack> > >> > + (truncate:<V_narrow> > >> > + (match_operand:MVE_5 1 "register_operand" "w")) > >> > + (truncate:<V_narrow> > >> > + (match_operand:MVE_5 2 "register_operand" "w"))))] > >> > + "TARGET_HAVE_MVE" > >> > + "vmovnb.i<V_sz_elem> %q0, %q1\;vmovnt.i<V_sz_elem> %q0, %q2" > >> > + [(set_attr "type" "mve_move")] > >> > +) > >> > + > >> > >> I realise this is (like you say) based on the neon.md pattern, > >> but we should use separate vmovnb and vmovnt instructions instead > >> of putting two instructions into a single pattern. > >> > >> One specific advantage to using separate patterns is that it would > >> avoid the imprecision of the earlyclobber: the output only conflicts > >> with operand 1 and can be tied to operand 2. > >> > > I'm not sure to follow: with the current patch, for test_clz_s16 we > > generate: > > vldrh.16 q3, [r1] > > vmovlb.s16 q2, q3 > > vmovlt.s16 q3, q3 > > vclz.i32 q2, q2 > > vclz.i32 q3, q3 > > vmovnb.i32 q1, q2 > > vmovnt.i32 q1, q3 > > vstrh.16 q1, [r0] > > > > If I remove the define_insn "@mve_vec_pack_trunc_<mode>" pattern > > and update define_expand "vec_pack_trunc_<mode>" to call: > > emit_insn (gen_mve_vmovnbq (VMOVNBQ_S, <MODE>mode, operands[0], > > operands[0], operands[1])); > > emit_insn (gen_mve_vmovntq (VMOVNTQ_S, <MODE>mode, operands[0], > > operands[0], operands[2])); > > instead, we now generate: > > vldrh.16 q3, [r1] > > vmovlb.s16 q1, q3 > > vldr.64 d4, .L5 > > vldr.64 d5, .L5+8 > > vmovlt.s16 q3, q3 > > vclz.i32 q1, q1 > > vclz.i32 q3, q3 > > vmovnb.i32 q2, q1 > > vmovnt.i32 q2, q3 > > vstrh.16 q2, [r0] > > bx lr > > .L6: > > .align 3 > > .L5: > > .short 0 > > .short 0 > > .short 0 > > .short 0 > > .short 0 > > .short 0 > > .short 0 > > .short 0 > > > > That's what I tried to explain in the commit message. > > Sorry, I missed that part of the message. If we want the first > instruction to have an in/out operand whose initial value is undefined > then I think the best way would be to have a second pattern for vmovnb > (only) that acts like a unary operation. > > FWIW, in the above, we shouldn't assign to operands[0] twice. > The result of the first operation needs to be a new temporary register.
Thanks. In the meantime, I tried to make some progress, and looked at how things work on aarch64. This led me to define (in mve.md): (define_insn "@mve_vec_pack_trunc_lo_<mode>" [(set (match_operand:<V_narrow> 0 "register_operand" "=w") (truncate:<V_narrow> (match_operand:MVE_5 1 "register_operand" "w")))] "TARGET_HAVE_MVE" "vmovnb.i<V_sz_elem> %q0, %q1\;" [(set_attr "type" "mve_move")] ) (define_insn "@mve_vec_pack_trunc_hi_<mode>" [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w") (vec_concat:<V_narrow_pack> (match_operand:<V_narrow> 1 "register_operand" "0") (truncate:<V_narrow> (match_operand:MVE_5 2 "register_operand" "w"))))] "TARGET_HAVE_MVE" "vmovnt.i<V_sz_elem> %q0, %q2\;" [(set_attr "type" "mve_move")] ) and in vec-common.md, for (define_expand "vec_pack_trunc_<mode>" [(set (match_operand:<V_narrow_pack> 0 "register_operand") (vec_concat:<V_narrow_pack> (truncate:<V_narrow> (match_operand:VN 1 "register_operand")) (truncate:<V_narrow> (match_operand:VN 2 "register_operand"))))] I expand this for MVE: rtx tmpreg = gen_reg_rtx (<V_narrow>mode); emit_insn (gen_mve_vec_pack_trunc_lo (<MODE>mode, tmpreg, operands[1])); emit_insn (gen_mve_vec_pack_trunc_hi (<MODE>mode, operands[0], tmpreg, operands[2])); I am getting an error in reload: error: unable to generate reloads for: (insn 10 9 11 2 (set (reg:V4HI 122 [ vect__4.16 ]) (truncate:V4HI (reg:V4SI 118 [ vect__4.16 ]))) "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3609 {mve_vec_pack_trunc_lo_v4si} (expr_list:REG_DEAD (reg:V4SI 118 [ vect__4.16 ]) (nil))) The next insn is: (insn 11 10 12 2 (set (reg:V8HI 121 [ vect__7.18 ]) (vec_concat:V8HI (reg:V4HI 122 [ vect__4.16 ]) (truncate:V4HI (reg:V4SI 119 [ vect__4.17 ])))) "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3611 {mve_vec_pack_trunc_hi_v4si} (expr_list:REG_DEAD (reg:V4HI 122 [ vect__4.16 ]) (expr_list:REG_DEAD (reg:V4SI 119 [ vect__4.17 ]) (nil)))) What is causing the reload error? Thanks, Christophe