Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Prathamesh Kulkarni via Gcc-patches Mon, 19 Jul 2021 05:34:55 -0700

On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni
<prathamesh.kulka...@linaro.org> wrote:
>
> On Thu, 15 Jul 2021 at 14:47, Christophe Lyon
> <christophe.lyon....@gmail.com> wrote:
> >
> > Hi Prathamesh,
> >
> > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches 
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> >> > Sent: 05 July 2021 10:18
> >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> >> > <kyrylo.tkac...@arm.com>
> >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
> >> > intrinsics
> >> >
> >> > Hi Kyrill,
> >> > I assume this patch is OK to commit after bootstrap+testing ?
> >>
> >> Yes.
> >> Thanks,
> >> Kyrill
> >>
> >
> >
> > The updated testcase fails on some configs:
> > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 2 
> > times
> > FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times 
> > vdup\\.16\\tq[0-9]+, r[0-9]+ 3
> >
> > For instance on arm-none-eabi with default configuration flags 
> > (mode/cpu/fpu)
> > and default runtestflags.
> > The same toolchain config also fails on this test when overriding 
> > runtestflags with:
> > -mthumb/-mfloat-abi=soft/-march=armv6s-m
> > -mthumb/-mfloat-abi=soft/-march=armv7-m
> > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
> >
> > Can you fix this please?
> Hi Christophe,
> Sorry for the breakage, I will take a look.
The issue is for the following function;


float16x8_t f2 (float16x8_t __a, float16_t __b) {
  return __a * __b;
}

With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it generates:
f2:
        ldrh    ip, [sp]        @ __fp16
        vmov    d18, r0, r1  @ v8hf
        vmov    d19, r2, r3
        vdup.16 q8, ip
        vmul.f16        q8, q8, q9
        vmov    r0, r1, d16  @ v8hf
        vmov    r2, r3, d17
        bx      lr

It correctly generates vdup, but IIUC, r0-r3 are used up in loading
'a' into q9 (d18 / d19),
and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus
the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails.
I tried to adjust the scan to following to accommodate ip:
/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } }  */
but that still FAIL's because log shows:
gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+,
(r[0-9]+|ip) found 6 times

Could you suggest how should I adjust the test, so the second operand
can be either r[0-9]+ or ip register ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> >
> > Christophe
> >
> >> >
> >> > Thanks,
> >> > Prathamesh

Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Reply via email to