Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

Christophe Lyon via Gcc-patches Mon, 17 May 2021 05:31:42 -0700

On Mon, 17 May 2021 at 12:35, Kyrylo Tkachov <kyrylo.tkac...@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 05 May 2021 15:08
> > To: Andre Simoes Dias Vieira <andre.simoesdiasvie...@arm.com>
> > Cc: gcc Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> >
> > On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.l...@linaro.org>
> > wrote:
> > >
> > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> > > <andre.simoesdiasvie...@arm.com> wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > Since MVE has a different set of vector comparison operators from
> > > > > Neon, we have to update the expansion to take into account the new
> > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > > with the inverted condition.
> > > > >
> > > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > > >
> > > > > For:
> > > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > > >
> > > > > we now generate:
> > > > > cmp_eq_vs32_reg:
> > > > >       vldr.64 d4, .L123       @ 8     [c=8 l=4]  *mve_movv4si/8
> > > > >       vldr.64 d5, .L123+8
> > > > >       vldr.64 d6, .L123+16    @ 9     [c=8 l=4]  *mve_movv4si/8
> > > > >       vldr.64 d7, .L123+24
> > > > >       vcmp.i32  eq, q0, q1    @ 7     [c=16 l=4]  mve_vcmpeqq_v4si
> > > > >       vpsel q0, q3, q2        @ 15    [c=8 l=4]  mve_vpselq_sv4si
> > > > >       bx      lr      @ 26    [c=8 l=4]  *thumb2_return
> > > > > .L124:
> > > > >       .align  3
> > > > > .L123:
> > > > >       .word   0
> > > > >       .word   0
> > > > >       .word   0
> > > > >       .word   0
> > > > >       .word   1
> > > > >       .word   1
> > > > >       .word   1
> > > > >       .word   1
> > > > >
> > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> > produces
> > > > > a pair of vldr instead of vmov.i32, qX, #0
> > > > I think ideally we would even want:
> > > > vpte  eq, q0, q1
> > > > vmovt.i32 q0, #0
> > > > vmove.i32 q0, #1
> > > >
> > > > But we don't have a way to generate VPT blocks with multiple
> > > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> > >
> > > TBH,  I looked at what LLVM generates currently ;-)
> > >
> >
> > Here is an updated version, which adds
> > && (!<Is_float_mode> || flag_unsafe_math_optimizations)
> > to vcond_mask_
> >
> > This condition was not present in the neon.md version I move to vec-
> > common.md,
> > but since the VDQW iterator includes V2SF and V4SF, it should take
> > float-point flags into account.
> >
>
> -      emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> +    case NE:
> +      if (TARGET_HAVE_MVE) {
> +       rtx vpr_p0;
>
> GNU style wants the '{' on the new line. This appears a few other times in 
> the patch.
>
> +       if (vcond_mve)
> +         vpr_p0 = target;
> +       else
> +         vpr_p0 = gen_reg_rtx (HImode);
> +
> +       switch (cmp_mode)
> +         {
> +         case E_V16QImode:
> +         case E_V8HImode:
> +         case E_V4SImode:
> +           emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
> (cmp_mode, op1)));
> +           break;
> +         case E_V8HFmode:
> +         case E_V4SFmode:
> +           if (TARGET_HAVE_MVE_FLOAT)
> +             emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
> force_reg (cmp_mode, op1)));
> +           else
> +             gcc_unreachable ();
> +           break;
> +         default:
> +           gcc_unreachable ();
> +         }
>
> Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT 
> or MODE_VECTOR_FLOAT here rather than have this switch statement.
>
> +
> +       /* If we are not expanding a vcond, build the result here.  */
> +       if (!vcond_mve) {
> +         rtx zero = gen_reg_rtx (cmp_result_mode);
> +         rtx one = gen_reg_rtx (cmp_result_mode);
> +         emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> +         emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> +         emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
> zero, vpr_p0));
> +       }
> +      }
> +      else
>
> ...
>    bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
> -                                            operands[4], operands[5], true);
> +                                            operands[4], operands[5], true, 
> vcond_mve);
>    if (inverted)
>      std::swap (operands[1], operands[2]);
> +  if (TARGET_NEON)
>    emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
>                             mask, operands[1], operands[2]));
> +  else
> +    {
> +      machine_mode cmp_mode = GET_MODE (operands[4]);
> +      rtx vpr_p0 = mask;
> +      rtx zero = gen_reg_rtx (cmp_mode);
> +      rtx one = gen_reg_rtx (cmp_mode);
> +      emit_move_insn (zero, CONST0_RTX (cmp_mode));
> +      emit_move_insn (one, CONST1_RTX (cmp_mode));
> +      switch (cmp_mode)
> +       {
> +       case E_V16QImode:
> +       case E_V8HImode:
> +       case E_V4SImode:
> +         emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
> one, zero, vpr_p0));
> +         break;
> +       case E_V8HFmode:
> +       case E_V4SFmode:
> +         if (TARGET_HAVE_MVE_FLOAT)
> +           emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, 
> vpr_p0));
> +         break;
> +       default:
> +         gcc_unreachable ();
> +       }
>
> Similarly here.
> Ok with those changes.


Thanks, committed after testing.

Christophe

> Thanks,
> Kyrill

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

Reply via email to