On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > > On Thu, 15 Jul 2021 at 14:47, Christophe Lyon > <christophe.lyon....@gmail.com> wrote: > > > > Hi Prathamesh, > > > > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> > >> > >> > -----Original Message----- > >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> > >> > Sent: 05 July 2021 10:18 > >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov > >> > <kyrylo.tkac...@arm.com> > >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n > >> > intrinsics > >> > > >> > Hi Kyrill, > >> > I assume this patch is OK to commit after bootstrap+testing ? > >> > >> Yes. > >> Thanks, > >> Kyrill > >> > > > > > > The updated testcase fails on some configs: > > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 2 > > times > > FAIL: gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times > > vdup\\.16\\tq[0-9]+, r[0-9]+ 3 > > > > For instance on arm-none-eabi with default configuration flags > > (mode/cpu/fpu) > > and default runtestflags. > > The same toolchain config also fails on this test when overriding > > runtestflags with: > > -mthumb/-mfloat-abi=soft/-march=armv6s-m > > -mthumb/-mfloat-abi=soft/-march=armv7-m > > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main > > > > Can you fix this please? > Hi Christophe, > Sorry for the breakage, I will take a look. The issue is for the following function;
float16x8_t f2 (float16x8_t __a, float16_t __b) { return __a * __b; } With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it generates: f2: ldrh ip, [sp] @ __fp16 vmov d18, r0, r1 @ v8hf vmov d19, r2, r3 vdup.16 q8, ip vmul.f16 q8, q8, q9 vmov r0, r1, d16 @ v8hf vmov r2, r3, d17 bx lr It correctly generates vdup, but IIUC, r0-r3 are used up in loading 'a' into q9 (d18 / d19), and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails. I tried to adjust the scan to following to accommodate ip: /* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } } */ but that still FAIL's because log shows: gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, (r[0-9]+|ip) found 6 times Could you suggest how should I adjust the test, so the second operand can be either r[0-9]+ or ip register ? Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > > > Christophe > > > >> > > >> > Thanks, > >> > Prathamesh