On 18/11/2024 12:00, Christophe Lyon wrote:
> Hi Torbjörn,
> 
> 
> On 11/18/24 10:37, Torbjorn SVENSSON wrote:
>>
>>
>> On 2024-11-08 20:37, Torbjorn SVENSSON wrote:
>>>
>>>
>>> On 2024-11-08 12:24, Richard Earnshaw (lists) wrote:
>>>> On 05/11/2024 20:06, Torbjörn SVENSSON wrote:
>>>>> Based on how these functions are used in test cases, I think it's correct
>>>>> to require 16-bit float support in both functions.
>>>>>
>>>>> Without this change, the checks passes for armv8-m and armv8.1-m, but the
>>>>> test cases that uses them fails due to the incorrect -mfpu option.
>>>>>
>>>>> Ok for trunk and releases/gcc-14?
>>>>
>>>> Can you expand on the issue you're trying to address with this change?
>>>
>>> If dejagnu is started with a specified FPU, the function 
>>> arm_v8_2a_fp16_scalar_ok will check if __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 
>>> is defined, but it will not ensure that the FPU supports 16-bit floats.
>>> The result is that with the given FPU, GCC might report that 
>>> __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is supported, but 16-bit floats are 
>>> not.
>>>
>>> With -march and -mfpu:
>>> .../bin/arm-none-eabi-gcc -E -dM - -mthumb -march=armv8-m.main+fp - 
>>> mfloat-abi=hard -mfpu=fpv5-sp-d16 -fdiagnostics-plain-output -O2 - 
>>> mcpu=unset -march=armv8.2-a+fp16  </dev/null | grep -e '__ARM_FP ' -e 
>>> __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
>>> #define __ARM_FP 4
>>> #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
>>>
>>>
>>> Same as above, but with -mfpu=auto appended:
>>> .../bin/arm-none-eabi-gcc -E -dM - -mthumb -march=armv8-m.main+fp - 
>>> mfloat-abi=hard -mfpu=fpv5-sp-d16 -fdiagnostics-plain-output -O2 - 
>>> mcpu=unset -march=armv8.2-a+fp16 -mfpu=auto  </dev/null | grep -e '__ARM_FP 
>>> ' -e __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
>>> #define __ARM_FP 14
>>> #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
>>>
>>>
>>> So, adding the __ARM_FP validation ensures that the empty set of flags is 
>>> never accepted for this scenario.
>>>
>>>
>>> For check_effective_target_arm_v8_2a_fp16_neon_ok_nocache, it's the same 
>>> thing but here we also assume that neon is available without checking it.
>>>
>>>
>>> Looking though other failing tests, I also notices that
>>> check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache is essential 
>>> a copy of check_effective_target_arm_v8_2a_fp16_neon_ok_nocache, but with a 
>>> different architecture and define, so I'll add a fix for that too.
>>>
>>>
>>> With all this said, I see that there is an error in this patch, so a v2 
>>> will be sent as soon as my current test run completes and there is no 
>>> regression.
>>
>>
>>
>> I've tried to dig a bit deeper into this topic.
>>
>> In gcc.target/arm/armv8_2-fp16-scalar-1.c, we do:
>>
>> /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
>>
>> this currently checks if __ARM_FEATURE_FP16_SCALAR_ARITHMETIC is defined.
>>
>>
>> The symbol __ARM_FEATURE_FP16_SCALAR_ARITHMETIC is defined in arm-c.cc:
>>
>>    def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC",
>>                TARGET_VFP_FP16INST);
>>
>>
>> The symbol TARGET_VFP_FP16INST is defined in arm.h:
>>
>> /* FPU supports the floating point FP16 instructions for ARMv8.2-A
>>     and later.  */
>> #define TARGET_VFP_FP16INST \
>>    (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 && arm_fp16_inst)
>>
>>
>> And arm_fp16_inst is defined in arm.cc:
>>
>> /* Nonzero if this chip supports the FP16 instructions extension of ARM
>>     Architecture 8.2.  */
>> int arm_fp16_inst = 0;
>>
>> and a bit further down:
>>
>>    arm_fp16_inst = bitmap_bit_p (arm_active_target.isa, isa_bit_fp16);
>>
>>
>> All this tends to that __ARM_FEATURE_FP16_SCALAR_ARITHMETIC  should be an 
>> armv8.2+ feature, but isa_bit_fp16 is also set for Cortex-M52, Cortex-M55 
>> and Cortex-M85 among the Cortex-M cpus defined in arm-cpu-cdata.h (generated 
>> from arm-cpus.in).
>>
>> Now to the reason for the failure.
>> In the test case, it includes arm_fp16.h, but arm_fp16.h contains a hole 
>> bunch of functions called __builtin_neon_*, so is this header file really 
>> applicable for Cortex-M?
>> Or should arm_fp16.h be updated to also contain an alternative list of 
>> functions applicable for Cortex-M?
>>
>>
>> The failure I see in my tests are:
>>
>> Executing on host: .../bin/arm-none-eabi-gcc 
>> .../gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c  -mthumb 
>> -march=armv7e-m+fp.dp -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 
>> -fdiagnostics-plain-output  -O2 -mcpu=unset -march=armv8.2-a+fp16 
>> -ffat-lto-objects -fno-ident -S     -o armv8_2-fp16-scalar-1.s (timeout = 
>> 800)
>> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc 
>> .../gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c -mthumb 
>> -march=armv7e-m+fp.dp -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 
>> -fdiagnostics-plain-output -O2 -mcpu=unset -march=armv8.2-a+fp16 
>> -ffat-lto-objects -fno-ident -S -o armv8_2-fp16-scalar-1.s
>> pid is 17266 -17266
>> In file included from 
>> .../gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c:7:
>> .../lib/gcc/arm-none-eabi/15.0.0/include/arm_fp16.h: In function 
>> 'test_vcvth_f16_s32':
>> .../lib/gcc/arm-none-eabi/15.0.0/include/arm_fp16.h:69:1: error: inlining 
>> failed in call to 'always_inline' 'vcvth_f16_s32': target specific option 
>> mismatch
>> .../gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c:35:10: note: called 
>> from here
>>
>>
>> Any ideas on how to get around this error?
>> Adding the __ARM_FP checks in arm_v8_2a_fp16_scalar_ok was not enough. It 
>> removes the test from Cortex-M33 (-mfpu=fpv5-sp-d16), but it still fails on 
>> Cortex-M7, Cortex-M55 and Cortex-M85 (all of them are tested with 
>> -mfpu=fpv5-d16).
>>
>> I suppose the main question is if arm_fp16.h should be a neon only include 
>> file.
>>

arm_fp16.h is defined by ACLE.  It says: 

<quote>

<arm_fp16.h> is provided to define the scalar 16-bit floating point arithmetic 
intrinsics. As these intrinsics are in the user namespace, an implementation 
would not normally define them until the header is included. When 
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC is defined to 1, the header file is 
available regardless of the context in which the macro is evaluated.

</quote>

So no mention of this header being neon-specific.  I don't see any reason why 
this is a bug in ACLE either.

> 
> Thanks for digging into this...
> 
> I looked at it, and in the testcase above arm_can_inline_p complains because 
> the callee's FPU is fp-armv8, when the caller's is fpv5-d16.
> 
> The builtin_neon_* in vfp.md are enabled by TARGET_VFP_FP16INST, so
> that's consistent with your other observations.
> 
> So indeed I'm not sure how to make arm_fp16.h accepted by non-NEON targets.
> 

Hmm, I think this is a real bug in the compiler.

I suspect the way to address this is similar to the way Christophe has been 
reworking the MVE intrinsics.  That is, to replace the contents of this header 
with a pragama such that we can decide when the intrinsic is invoked whether we 
want a Neon or an MVE compatible variant.

This isn't gcc-15 material at this point though, so perhaps we should create a 
ticket in bugzilla for this.


R.

> Thanks,
> 
> Christophe
> 
>> Kind regards,
>> Torbjörn
>>
>>
>>
>>
>>>
>>>
>>>>
>>>> R.
>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> In both functions, it's assumed that 16-bit float support is available,
>>>>> but it's not checked.
>>>>> In addition, check_effective_target_arm_v8_2a_fp16_neon_ok also assumes
>>>>> that neon is used, but it's not checked.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>     * lib/target-supports.exp
>>>>>     (check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Check
>>>>>     that 16-bit float is supported.
>>>>>     (check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Check
>>>>>     that neon is used and that 16-bit float is supported.
>>>>>
>>>>> Signed-off-by: Torbjörn SVENSSON <torbjorn.svens...@foss.st.com>
>>>>> ---
>>>>>   gcc/testsuite/lib/target-supports.exp | 15 +++++++++++++++
>>>>>   1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/ 
>>>>> lib/target-supports.exp
>>>>> index 75703ddca60..19a9981d9cd 100644
>>>>> --- a/gcc/testsuite/lib/target-supports.exp
>>>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>>>> @@ -6360,6 +6360,12 @@ proc 
>>>>> check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
>>>>>                  "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
>>>>>       if { [check_no_compiler_messages_nocache \
>>>>>             arm_v8_2a_fp16_scalar_ok object {
>>>>> +        #if !defined (__ARM_FP)
>>>>> +        #error "__ARM_FP not defined"
>>>>> +        #endif
>>>>> +        #if ((__ARM_FP & 1) == 0)
>>>>> +        #error "__ARM_FP indicates that 16-bit is not supported"
>>>>> +        #endif
>>>>>           #if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
>>>>>           #error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
>>>>>           #endif
>>>>> @@ -6395,6 +6401,15 @@ proc 
>>>>> check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
>>>>>                  "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
>>>>>       if { [check_no_compiler_messages_nocache \
>>>>>             arm_v8_2a_fp16_neon_ok object {
>>>>> +        #if !defined (__ARM_FP)
>>>>> +        #error "__ARM_FP not defined"
>>>>> +        #endif
>>>>> +        #if ((__ARM_FP & 1) == 0)
>>>>> +        #error "__ARM_FP indicates that 16-bit is not supported"
>>>>> +        #endif
>>>>> +        #if !defined (__ARM_NEON__)
>>>>> +        #error "__ARM_NEON__ not defined"
>>>>> +        #endif
>>>>>           #if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
>>>>>           #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
>>>>>           #endif
>>>>
>>>
>>

Reply via email to