On Mon, 17 May 2021 at 12:35, Kyrylo Tkachov <kyrylo.tkac...@arm.com> wrote: > > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> On Behalf Of > > Christophe Lyon via Gcc-patches > > Sent: 05 May 2021 15:08 > > To: Andre Simoes Dias Vieira <andre.simoesdiasvie...@arm.com> > > Cc: gcc Patches <gcc-patches@gcc.gnu.org> > > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp > > > > On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.l...@linaro.org> > > wrote: > > > > > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) > > > <andre.simoesdiasvie...@arm.com> wrote: > > > > > > > > Hi Christophe, > > > > > > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > > > Since MVE has a different set of vector comparison operators from > > > > > Neon, we have to update the expansion to take into account the new > > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > > > > with the inverted condition. > > > > > > > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > > > > > > > For: > > > > > typedef long int vs32 __attribute__((vector_size(16))); > > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > > > > > > > we now generate: > > > > > cmp_eq_vs32_reg: > > > > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > > > > vldr.64 d5, .L123+8 > > > > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > > > > vldr.64 d7, .L123+24 > > > > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > > > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > > > > bx lr @ 26 [c=8 l=4] *thumb2_return > > > > > .L124: > > > > > .align 3 > > > > > .L123: > > > > > .word 0 > > > > > .word 0 > > > > > .word 0 > > > > > .word 0 > > > > > .word 1 > > > > > .word 1 > > > > > .word 1 > > > > > .word 1 > > > > > > > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) > > produces > > > > > a pair of vldr instead of vmov.i32, qX, #0 > > > > I think ideally we would even want: > > > > vpte eq, q0, q1 > > > > vmovt.i32 q0, #0 > > > > vmove.i32 q0, #1 > > > > > > > > But we don't have a way to generate VPT blocks with multiple > > > > instructions yet unfortunately so I guess VPSEL will have to do for now. > > > > > > TBH, I looked at what LLVM generates currently ;-) > > > > > > > Here is an updated version, which adds > > && (!<Is_float_mode> || flag_unsafe_math_optimizations) > > to vcond_mask_ > > > > This condition was not present in the neon.md version I move to vec- > > common.md, > > but since the VDQW iterator includes V2SF and V4SF, it should take > > float-point flags into account. > > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > + case NE: > + if (TARGET_HAVE_MVE) { > + rtx vpr_p0; > > GNU style wants the '{' on the new line. This appears a few other times in > the patch. > > + if (vcond_mve) > + vpr_p0 = target; > + else > + vpr_p0 = gen_reg_rtx (HImode); > + > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg > (cmp_mode, op1))); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, > force_reg (cmp_mode, op1))); > + else > + gcc_unreachable (); > + break; > + default: > + gcc_unreachable (); > + } > > Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT > or MODE_VECTOR_FLOAT here rather than have this switch statement. > > + > + /* If we are not expanding a vcond, build the result here. */ > + if (!vcond_mve) { > + rtx zero = gen_reg_rtx (cmp_result_mode); > + rtx one = gen_reg_rtx (cmp_result_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, > zero, vpr_p0)); > + } > + } > + else > > ... > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > - operands[4], operands[5], true); > + operands[4], operands[5], true, > vcond_mve); > if (inverted) > std::swap (operands[1], operands[2]); > + if (TARGET_NEON) > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > mask, operands[1], operands[2])); > + else > + { > + machine_mode cmp_mode = GET_MODE (operands[4]); > + rtx vpr_p0 = mask; > + rtx zero = gen_reg_rtx (cmp_mode); > + rtx one = gen_reg_rtx (cmp_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], > one, zero, vpr_p0)); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, > vpr_p0)); > + break; > + default: > + gcc_unreachable (); > + } > > Similarly here. > Ok with those changes.
Thanks, committed after testing. Christophe > Thanks, > Kyrill