On Wed, 11 Aug 2021 at 22:23, Christophe Lyon <christophe.lyon....@gmail.com> wrote: > > > > On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> >> >> > -----Original Message----- >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> > Sent: 24 June 2021 12:11 >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov >> > <kyrylo.tkac...@arm.com> >> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics >> > >> > Hi, >> > This patch replaces builtins for vdup_n and vmov_n. >> > The patch results in regression for pr51534.c. >> > Consider following function: >> > >> > uint8x8_t f1 (uint8x8_t a) { >> > return vcgt_u8(a, vdup_n_u8(0)); >> > } >> > >> > code-gen before patch: >> > f1: >> > vmov.i32 d16, #0 @ v8qi >> > vcgt.u8 d0, d0, d16 >> > bx lr >> > >> > code-gen after patch: >> > f1: >> > vceq.i8 d0, d0, #0 >> > vmvn d0, d0 >> > bx lr >> > >> > I am not sure which one is better tho ? >> > > Hi Prathamesh, > > This patch introduces a regression on non-hardfp configs (eg > arm-linux-gnueabi or arm-eabi): > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c > scan-assembler-times vmov.i32[ \t]+[dD][0-9]+, #0xffffffff 3 > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c > scan-assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3 > > Can you fix this? The issue is, for following test:
#include <arm_neon.h> uint8x8_t f1 (uint8x8_t a) { return vcge_u8(a, vdup_n_u8(0)); } armhf code-gen: f1: vmov.i32 d0, #0xffffffff @ v8qi bx lr arm softfp code-gen: f1: mov r0, #-1 mov r1, #-1 bx lr The code-gen for both is same upto split2 pass: (insn 10 6 11 2 (set (reg/i:V8QI 16 s0) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) "foo.c":5:1 1052 {*neon_movv8qi} (expr_list:REG_EQUAL (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ]) (nil))) (insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1 (nil)) and for softfp target, split2 pass splits the assignment to r0 and r1: (insn 15 6 16 2 (set (reg:SI 0 r0) (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 {*thumb2_movsi_vfp} (nil)) (insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ]) (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 {*thumb2_movsi_vfp} (nil)) (insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1 (nil)) I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp targets ? Thanks, Prathamesh > > Thanks > > Christophe > > >> >> I think they're equivalent in practice, in any case the patch itself is good >> (move away from RTL builtins). >> Ok. >> Thanks, >> Kyrill >> >> > >> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi, >> > which is due to a missed opt in lowering. I had filed it as >> > PR98435, and posted a fix for it here: >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html >> > >> > Thanks, >> > Prathamesh