Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

Prathamesh Kulkarni via Gcc-patches Thu, 12 Aug 2021 04:55:11 -0700

On Wed, 11 Aug 2021 at 22:23, Christophe Lyon
<christophe.lyon....@gmail.com> wrote:
>
>
>
> On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> > -----Original Message-----
>> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
>> > Sent: 24 June 2021 12:11
>> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> > <kyrylo.tkac...@arm.com>
>> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics
>> >
>> > Hi,
>> > This patch replaces builtins for vdup_n and vmov_n.
>> > The patch results in regression for pr51534.c.
>> > Consider following function:
>> >
>> > uint8x8_t f1 (uint8x8_t a) {
>> >   return vcgt_u8(a, vdup_n_u8(0));
>> > }
>> >
>> > code-gen before patch:
>> > f1:
>> >         vmov.i32  d16, #0  @ v8qi
>> >         vcgt.u8     d0, d0, d16
>> >         bx             lr
>> >
>> > code-gen after patch:
>> > f1:
>> >         vceq.i8 d0, d0, #0
>> >         vmvn    d0, d0
>> >         bx         lr
>> >
>> > I am not sure which one is better tho ?
>>
>
> Hi Prathamesh,
>
> This patch introduces a regression on non-hardfp configs (eg 
> arm-linux-gnueabi or arm-eabi):
> FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c 
> scan-assembler-times vmov.i32[ \t]+[dD][0-9]+, #0xffffffff 3
> FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c 
> scan-assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3
>
> Can you fix this?
The issue is, for following test:


#include <arm_neon.h>

uint8x8_t f1 (uint8x8_t a) {
  return vcge_u8(a, vdup_n_u8(0));
}

armhf code-gen:
f1:
        vmov.i32  d0, #0xffffffff  @ v8qi
        bx            lr

arm softfp code-gen:
f1:
        mov     r0, #-1
        mov     r1, #-1
        bx      lr

The code-gen for both is same upto split2 pass:

(insn 10 6 11 2 (set (reg/i:V8QI 16 s0)
        (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])) "foo.c":5:1 1052 {*neon_movv8qi}
     (expr_list:REG_EQUAL (const_vector:V8QI [
                (const_int -1 [0xffffffffffffffff]) repeated x8
            ])
        (nil)))
(insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1
     (nil))

and for softfp target, split2 pass splits the assignment to r0 and r1:

(insn 15 6 16 2 (set (reg:SI 0 r0)
        (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 {*thumb2_movsi_vfp}
     (nil))
(insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ])
        (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 {*thumb2_movsi_vfp}
     (nil))
(insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1
     (nil))

I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp targets ?

Thanks,
Prathamesh
>
> Thanks
>
> Christophe
>
>
>>
>> I think they're equivalent in practice, in any case the patch itself is good 
>> (move away from RTL builtins).
>> Ok.
>> Thanks,
>> Kyrill
>>
>> >
>> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi,
>> > which is due to a missed opt in lowering. I had filed it as
>> > PR98435, and posted a fix for it here:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html
>> >
>> > Thanks,
>> > Prathamesh

Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

Reply via email to