On Tue, 3 Aug 2021 at 18:23, Christophe Lyon <christophe.lyon....@gmail.com> wrote: > > > > On Mon, Jul 19, 2021 at 2:34 PM Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> >> On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni >> <prathamesh.kulka...@linaro.org> wrote: >> > >> > On Thu, 15 Jul 2021 at 14:47, Christophe Lyon >> > <christophe.lyon....@gmail.com> wrote: >> > > >> > > Hi Prathamesh, >> > > >> > > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches >> > > <gcc-patches@gcc.gnu.org> wrote: >> > >> >> > >> >> > >> >> > >> > -----Original Message----- >> > >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> > >> > Sent: 05 July 2021 10:18 >> > >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov >> > >> > <kyrylo.tkac...@arm.com> >> > >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n >> > >> > intrinsics >> > >> > >> > >> > Hi Kyrill, >> > >> > I assume this patch is OK to commit after bootstrap+testing ? >> > >> >> > >> Yes. >> > >> Thanks, >> > >> Kyrill >> > >> >> > > >> > > >> > > The updated testcase fails on some configs: >> > > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found >> > > 2 times >> > > FAIL: gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times >> > > vdup\\.16\\tq[0-9]+, r[0-9]+ 3 >> > > >> > > For instance on arm-none-eabi with default configuration flags >> > > (mode/cpu/fpu) >> > > and default runtestflags. >> > > The same toolchain config also fails on this test when overriding >> > > runtestflags with: >> > > -mthumb/-mfloat-abi=soft/-march=armv6s-m >> > > -mthumb/-mfloat-abi=soft/-march=armv7-m >> > > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main >> > > >> > > Can you fix this please? >> > Hi Christophe, >> > Sorry for the breakage, I will take a look. >> The issue is for the following function; >> >> float16x8_t f2 (float16x8_t __a, float16_t __b) { >> return __a * __b; >> } >> >> With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it generates: >> f2: >> ldrh ip, [sp] @ __fp16 >> vmov d18, r0, r1 @ v8hf >> vmov d19, r2, r3 >> vdup.16 q8, ip >> vmul.f16 q8, q8, q9 >> vmov r0, r1, d16 @ v8hf >> vmov r2, r3, d17 >> bx lr >> >> It correctly generates vdup, but IIUC, r0-r3 are used up in loading >> 'a' into q9 (d18 / d19), >> and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus >> the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails. >> I tried to adjust the scan to following to accommodate ip: >> /* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } } >> */ >> but that still FAIL's because log shows: >> gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, >> (r[0-9]+|ip) found 6 times >> >> Could you suggest how should I adjust the test, so the second operand >> can be either r[0-9]+ or ip register ? >> > > Sorry for the delay, I was on vacation. > > I don't know off-hand how to adjust the test, did you check why it matched 6 > times? I am not sure. {vdup\.16\tq[0-9]+, r[0-9]+} matches 2 times as expected. However if I surround r[0-9]+ with parens: {vdup\.16\tq[0-9]+, (r[0-9]+)} the number of matches is 4, which is twice the number of actual matches!
Similarly, {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } gets matched 6 times, twice the expected value. This is debug log from runtest for the r[0-9]+|ip case: https://people.linaro.org/~prathamesh.kulkarni/dbg.log Unfortunately, I couldn't make much out of it. Investigating further. Thanks, Prathamesh > > Christophe > >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Prathamesh >> > > >> > > Thanks, >> > > >> > > Christophe >> > > >> > >> > >> > >> > Thanks, >> > >> > Prathamesh