Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

Christophe Lyon via Gcc-patches Thu, 02 Sep 2021 02:03:11 -0700

On Tue, Aug 24, 2021 at 10:17 AM Kyrylo Tkachov <kyrylo.tkac...@arm.com>
wrote:


>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> > Sent: 24 August 2021 09:01
> > To: Christophe Lyon <christophe.lyon....@gmail.com>
> > Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; gcc Patches <gcc-
> > patc...@gcc.gnu.org>
> > Subject: Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n
> > intrinsics
> >
> > On Tue, 17 Aug 2021 at 11:55, Prathamesh Kulkarni
> > <prathamesh.kulka...@linaro.org> wrote:
> > >
> > > On Thu, 12 Aug 2021 at 19:04, Christophe Lyon
> > > <christophe.lyon....@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On Thu, Aug 12, 2021 at 1:54 PM Prathamesh Kulkarni
> > <prathamesh.kulka...@linaro.org> wrote:
> > > >>
> > > >> On Wed, 11 Aug 2021 at 22:23, Christophe Lyon
> > > >> <christophe.lyon....@gmail.com> wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches
> <gcc-
> > patc...@gcc.gnu.org> wrote:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> > -----Original Message-----
> > > >> >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> > > >> >> > Sent: 24 June 2021 12:11
> > > >> >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > > >> >> > <kyrylo.tkac...@arm.com>
> > > >> >> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n
> > intrinsics
> > > >> >> >
> > > >> >> > Hi,
> > > >> >> > This patch replaces builtins for vdup_n and vmov_n.
> > > >> >> > The patch results in regression for pr51534.c.
> > > >> >> > Consider following function:
> > > >> >> >
> > > >> >> > uint8x8_t f1 (uint8x8_t a) {
> > > >> >> >   return vcgt_u8(a, vdup_n_u8(0));
> > > >> >> > }
> > > >> >> >
> > > >> >> > code-gen before patch:
> > > >> >> > f1:
> > > >> >> >         vmov.i32  d16, #0  @ v8qi
> > > >> >> >         vcgt.u8     d0, d0, d16
> > > >> >> >         bx             lr
> > > >> >> >
> > > >> >> > code-gen after patch:
> > > >> >> > f1:
> > > >> >> >         vceq.i8 d0, d0, #0
> > > >> >> >         vmvn    d0, d0
> > > >> >> >         bx         lr
> > > >> >> >
> > > >> >> > I am not sure which one is better tho ?
> > > >> >>
> > > >> >
> > > >> > Hi Prathamesh,
> > > >> >
> > > >> > This patch introduces a regression on non-hardfp configs (eg arm-
> > linux-gnueabi or arm-eabi):
> > > >> > FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan-
> > assembler-times vmov.i32[ \t]+[dD][0-9]+, #0xffffffff 3
> > > >> > FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan-
> > assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3
> > > >> >
> > > >> > Can you fix this?
> > > >> The issue is, for following test:
> > > >>
> > > >> #include <arm_neon.h>
> > > >>
> > > >> uint8x8_t f1 (uint8x8_t a) {
> > > >>   return vcge_u8(a, vdup_n_u8(0));
> > > >> }
> > > >>
> > > >> armhf code-gen:
> > > >> f1:
> > > >>         vmov.i32  d0, #0xffffffff  @ v8qi
> > > >>         bx            lr
> > > >>
> > > >> arm softfp code-gen:
> > > >> f1:
> > > >>         mov     r0, #-1
> > > >>         mov     r1, #-1
> > > >>         bx      lr
> > > >>
> > > >> The code-gen for both is same upto split2 pass:
> > > >>
> > > >> (insn 10 6 11 2 (set (reg/i:V8QI 16 s0)
> > > >>         (const_vector:V8QI [
> > > >>                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > > >>             ])) "foo.c":5:1 1052 {*neon_movv8qi}
> > > >>      (expr_list:REG_EQUAL (const_vector:V8QI [
> > > >>                 (const_int -1 [0xffffffffffffffff]) repeated x8
> > > >>             ])
> > > >>         (nil)))
> > > >> (insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1
> > > >>      (nil))
> > > >>
> > > >> and for softfp target, split2 pass splits the assignment to r0 and
> r1:
> > > >>
> > > >> (insn 15 6 16 2 (set (reg:SI 0 r0)
> > > >>         (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740
> > {*thumb2_movsi_vfp}
> > > >>      (nil))
> > > >> (insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ])
> > > >>         (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740
> > {*thumb2_movsi_vfp}
> > > >>      (nil))
> > > >> (insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1
> > > >>      (nil))
> > > >>
> > > >> I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp
> targets ?
> > > >>
> > > > Yes, probably, or try with check-function-bodies.
> > > Hi,
> > > Sorry for the late response. Does the attached patch look OK ?
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577532.html
>
> Ok.
>


Sorry Prathamesh, this does not quite work. See
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/r12-3294-g7a6f40d0452ec76e126c2612dcfa32f3c73e2315/report-build-info.html
(red cells in the gcc column)

Can you have a look?

Thanks

Christophe

Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > >  Christophe
> > > >
> > > >> Thanks,
> > > >> Prathamesh
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > Christophe
> > > >> >
> > > >> >
> > > >> >>
> > > >> >> I think they're equivalent in practice, in any case the patch
> itself is
> > good (move away from RTL builtins).
> > > >> >> Ok.
> > > >> >> Thanks,
> > > >> >> Kyrill
> > > >> >>
> > > >> >> >
> > > >> >> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi,
> > > >> >> > which is due to a missed opt in lowering. I had filed it as
> > > >> >> > PR98435, and posted a fix for it here:
> > > >> >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-
> > June/572648.html
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> > Prathamesh
>

Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

Reply via email to