RE: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

Kyrylo Tkachov via Gcc-patches Mon, 17 May 2021 03:35:44 -0700


> -----Original Message-----
> From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:08
> To: Andre Simoes Dias Vieira <andre.simoesdiasvie...@arm.com>
> Cc: gcc Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> 
> On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.l...@linaro.org>
> wrote:
> >
> > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> > <andre.simoesdiasvie...@arm.com> wrote:
> > >
> > > Hi Christophe,
> > >
> > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > Since MVE has a different set of vector comparison operators from
> > > > Neon, we have to update the expansion to take into account the new
> > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > with the inverted condition.
> > > >
> > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > >
> > > > For:
> > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > >
> > > > we now generate:
> > > > cmp_eq_vs32_reg:
> > > >       vldr.64 d4, .L123       @ 8     [c=8 l=4]  *mve_movv4si/8
> > > >       vldr.64 d5, .L123+8
> > > >       vldr.64 d6, .L123+16    @ 9     [c=8 l=4]  *mve_movv4si/8
> > > >       vldr.64 d7, .L123+24
> > > >       vcmp.i32  eq, q0, q1    @ 7     [c=16 l=4]  mve_vcmpeqq_v4si
> > > >       vpsel q0, q3, q2        @ 15    [c=8 l=4]  mve_vpselq_sv4si
> > > >       bx      lr      @ 26    [c=8 l=4]  *thumb2_return
> > > > .L124:
> > > >       .align  3
> > > > .L123:
> > > >       .word   0
> > > >       .word   0
> > > >       .word   0
> > > >       .word   0
> > > >       .word   1
> > > >       .word   1
> > > >       .word   1
> > > >       .word   1
> > > >
> > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> produces
> > > > a pair of vldr instead of vmov.i32, qX, #0
> > > I think ideally we would even want:
> > > vpte  eq, q0, q1
> > > vmovt.i32 q0, #0
> > > vmove.i32 q0, #1
> > >
> > > But we don't have a way to generate VPT blocks with multiple
> > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> >
> > TBH,  I looked at what LLVM generates currently ;-)
> >
> 
> Here is an updated version, which adds
> && (!<Is_float_mode> || flag_unsafe_math_optimizations)
> to vcond_mask_
> 
> This condition was not present in the neon.md version I move to vec-
> common.md,
> but since the VDQW iterator includes V2SF and V4SF, it should take
> float-point flags into account.
>


-      emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
+    case NE:
+      if (TARGET_HAVE_MVE) {
+       rtx vpr_p0;

GNU style wants the '{' on the new line. This appears a few other times in the 
patch.

+       if (vcond_mve)
+         vpr_p0 = target;
+       else
+         vpr_p0 = gen_reg_rtx (HImode);
+
+       switch (cmp_mode)
+         {
+         case E_V16QImode:
+         case E_V8HImode:
+         case E_V4SImode:
+           emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
(cmp_mode, op1)));
+           break;
+         case E_V8HFmode:
+         case E_V4SFmode:
+           if (TARGET_HAVE_MVE_FLOAT)
+             emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
force_reg (cmp_mode, op1)));
+           else
+             gcc_unreachable ();
+           break;
+         default:
+           gcc_unreachable ();
+         }

Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT or 
MODE_VECTOR_FLOAT here rather than have this switch statement.

+
+       /* If we are not expanding a vcond, build the result here.  */
+       if (!vcond_mve) {
+         rtx zero = gen_reg_rtx (cmp_result_mode);
+         rtx one = gen_reg_rtx (cmp_result_mode);
+         emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
+         emit_move_insn (one, CONST1_RTX (cmp_result_mode));
+         emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
zero, vpr_p0));
+       }
+      }
+      else

...
   bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-                                            operands[4], operands[5], true);
+                                            operands[4], operands[5], true, 
vcond_mve);
   if (inverted)
     std::swap (operands[1], operands[2]);
+  if (TARGET_NEON)
   emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
                            mask, operands[1], operands[2]));
+  else
+    {
+      machine_mode cmp_mode = GET_MODE (operands[4]);
+      rtx vpr_p0 = mask;
+      rtx zero = gen_reg_rtx (cmp_mode);
+      rtx one = gen_reg_rtx (cmp_mode);
+      emit_move_insn (zero, CONST0_RTX (cmp_mode));
+      emit_move_insn (one, CONST1_RTX (cmp_mode));
+      switch (cmp_mode)
+       {
+       case E_V16QImode:
+       case E_V8HImode:
+       case E_V4SImode:
+         emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
one, zero, vpr_p0));
+         break;
+       case E_V8HFmode:
+       case E_V4SFmode:
+         if (TARGET_HAVE_MVE_FLOAT)
+           emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, 
vpr_p0));
+         break;
+       default:
+         gcc_unreachable ();
+       }

Similarly here.
Ok with those changes.
Thanks,
Kyrill

RE: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

Reply via email to