Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Prathamesh Kulkarni via Gcc-patches Mon, 09 Aug 2021 04:11:48 -0700

On Tue, 3 Aug 2021 at 18:23, Christophe Lyon
<christophe.lyon....@gmail.com> wrote:
>
>
>
> On Mon, Jul 19, 2021 at 2:34 PM Prathamesh Kulkarni 
> <prathamesh.kulka...@linaro.org> wrote:
>>
>> On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni
>> <prathamesh.kulka...@linaro.org> wrote:
>> >
>> > On Thu, 15 Jul 2021 at 14:47, Christophe Lyon
>> > <christophe.lyon....@gmail.com> wrote:
>> > >
>> > > Hi Prathamesh,
>> > >
>> > > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches 
>> > > <gcc-patches@gcc.gnu.org> wrote:
>> > >>
>> > >>
>> > >>
>> > >> > -----Original Message-----
>> > >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
>> > >> > Sent: 05 July 2021 10:18
>> > >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
>> > >> > <kyrylo.tkac...@arm.com>
>> > >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
>> > >> > intrinsics
>> > >> >
>> > >> > Hi Kyrill,
>> > >> > I assume this patch is OK to commit after bootstrap+testing ?
>> > >>
>> > >> Yes.
>> > >> Thanks,
>> > >> Kyrill
>> > >>
>> > >
>> > >
>> > > The updated testcase fails on some configs:
>> > > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ found 
>> > > 2 times
>> > > FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times 
>> > > vdup\\.16\\tq[0-9]+, r[0-9]+ 3
>> > >
>> > > For instance on arm-none-eabi with default configuration flags 
>> > > (mode/cpu/fpu)
>> > > and default runtestflags.
>> > > The same toolchain config also fails on this test when overriding 
>> > > runtestflags with:
>> > > -mthumb/-mfloat-abi=soft/-march=armv6s-m
>> > > -mthumb/-mfloat-abi=soft/-march=armv7-m
>> > > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
>> > >
>> > > Can you fix this please?
>> > Hi Christophe,
>> > Sorry for the breakage, I will take a look.
>> The issue is for the following function;
>>
>> float16x8_t f2 (float16x8_t __a, float16_t __b) {
>>   return __a * __b;
>> }
>>
>> With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it generates:
>> f2:
>>         ldrh    ip, [sp]        @ __fp16
>>         vmov    d18, r0, r1  @ v8hf
>>         vmov    d19, r2, r3
>>         vdup.16 q8, ip
>>         vmul.f16        q8, q8, q9
>>         vmov    r0, r1, d16  @ v8hf
>>         vmov    r2, r3, d17
>>         bx      lr
>>
>> It correctly generates vdup, but IIUC, r0-r3 are used up in loading
>> 'a' into q9 (d18 / d19),
>> and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus
>> the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails.
>> I tried to adjust the scan to following to accommodate ip:
>> /* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } } 
>>  */
>> but that still FAIL's because log shows:
>> gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+,
>> (r[0-9]+|ip) found 6 times
>>
>> Could you suggest how should I adjust the test, so the second operand
>> can be either r[0-9]+ or ip register ?
>>
>
> Sorry for the delay, I was on vacation.
>
> I don't know off-hand how to adjust the test, did you check why it matched 6 
> times?
I am not sure.
{vdup\.16\tq[0-9]+, r[0-9]+} matches 2 times as expected.
However if I surround r[0-9]+ with parens:
{vdup\.16\tq[0-9]+, (r[0-9]+)}
the number of matches is 4, which is twice the number of actual matches!


Similarly,  {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } gets matched 6
times, twice the expected value.
This is debug log from runtest for the r[0-9]+|ip case:
https://people.linaro.org/~prathamesh.kulkarni/dbg.log
Unfortunately, I couldn't make much out of it.
Investigating further.

Thanks,
Prathamesh
>
> Christophe
>
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Prathamesh
>> > >
>> > > Thanks,
>> > >
>> > > Christophe
>> > >
>> > >> >
>> > >> > Thanks,
>> > >> > Prathamesh

Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Reply via email to