2024-08-13 15:53  Jakub Jelinek <ja...@redhat.com> wrote:
>
>On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote:
>> Thank you very much for the in-depth discussion between Jakub Jelinek and 
>> jeff.
>> My knowledge is narrow, and I am not familiar with architectures other than 
>> RISCV.
>> At the same time, my understanding of libraries such as libc and libm is 
>> also shallow.
>>
>> I spent some time sorting out my thoughts, which resulted in slow email 
>> replies. I am very sorry.
>
>The important thing is that the current state of BF16 support on other
>architectures is what we want there, not more.  So any changes done for
>RISCV shouldn't affect the other architectures, that wasn't the case of
>the patch you've posted.
>E.g. on x86_64, for FP16 we have:
>__divhc3@@GCC_12.0.0
>__eqhf2@@GCC_12.0.0
>__extendhfdf2@@GCC_12.0.0
>__extendhfsf2@@GCC_12.0.0
>__extendhftf2@@GCC_12.0.0
>__extendhfxf2@@GCC_12.0.0
>__fixhfti@@GCC_12.0.0
>__fixunshfti@@GCC_12.0.0
>__floatbitinthf@@GCC_14.0.0
>__floattihf@@GCC_12.0.0
>__floatuntihf@@GCC_12.0.0
>__mulhc3@@GCC_12.0.0
>__nehf2@@GCC_12.0.0
>__truncdfhf2@@GCC_12.0.0
>__trunchfbf2@@GCC_13.0.0
>__truncsfhf2@@GCC_12.0.0
>__trunctfhf2@@GCC_12.0.0
>__truncxfhf2@@GCC_12.0.0
>exported from libgcc, while for BF16 just:
>__extendbfsf2@@GCC_13.0.0
>__floatbitintbf@@GCC_14.0.0
>__floattibf@@GCC_13.0.0
>__floatuntibf@@GCC_13.0.0
>__truncdfbf2@@GCC_13.0.0
>__trunchfbf2@@GCC_13.0.0
>__truncsfbf2@@GCC_13.0.0
>__trunctfbf2@@GCC_13.0.0
>__truncxfbf2@@GCC_13.0.0
>More attention has been paid to what we actually need there, which is
>primarily conversions to/from other types (but even not to all of them, with
>some changes on the RTL expression lowering side to make sure we use the
>SFmode arithmetics as much as possible and only have the really required
>stuff on the libgcc side.
>We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV,
>then it should be added for that arch only.  And similarly, the choice
>of the builtins on the compiler side, the two builtins we have right now is
>all we want on the other arches.  So, further builtins would be either a
>matter of RISCV specific builtins, or in generic code but guarded by some
>target hook so that they aren't enabled on arches which don't want them.
>On the libstdc++ side, the current headers provide for std::bfloat16_t and
>std::float16_t an implementation which uses SFmode calculations where
>possible, so stuff like:
>  constexpr _Float16
>  acos(_Float16 __x)
>  { return _Float16(__builtin_acosf(__x)); }
>or
>  constexpr __gnu_cxx::__bfloat16_t
>  acos(__gnu_cxx::__bfloat16_t __x)
>  { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); }
>And for printing, note there is
>_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
>_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>which input and output _Float16 and __bf16, but in the parameter passing
>they expect those types to be promoted to float, so that the ABIs aren't
>dependent on when a particular arch enables those types.
>
>For RISCV, the things to consider are, what is the _Float16 and __bf16
>function argument passing/returning ABI?  Is the type enabled on all
>variants of RISCV, or just some (e.g. regarding _Float16 and __bf16
>on i686-linux, there is support for it only if the SSE2 ISA is available,
>so e.g. the *[hb][fc]* functions in libgcc need to be compiled with
>-msse2 extra flag)?  If it can be passed/returned the same in all ABIs,
>what excess precision mode do you want to use on them?  I mean e.g. the
>TARGET_C_EXCESS_PRECISION target hook.  On e.g. x86_64, the default
>is to promote all _Float16 and __bf16 calculations to float, so if you have
>__bf16 a, b, c, d, e;
>...
>a = b * c + d - e + c * d;
>all variables are converted to SFmode temporaries and all the arithmetics
>is done in SFmode and only then at the end finally converted to HFmode
>or BFmode.  One can request a different mode, -fexcess-precision=16
>in which such promotion isn't done, but as there is no hw support for
>most of the operations, the actual multiplication, addition or subtraction
>is still done in SFmode, just there is a conversion to BFmode after each
>operation (so slower, but more precise).
>If you still want to export __divbc3 and __mulbc3, do you want to export
>those just on some RISCV ABI variants or all of them?  Depending on that,
>arrange for those to be compiled just for those; and, if it is exported
>from libgcc_s.so.1, you also need to add a symbol version for those, likely
>GCC_15.0.0.
>
>For enabling just those 2 functions, I don't think you need any changes on
>the builtins.def etc. side, those aren't builtins but libcalls.
>
>If you need other libgcc calls, similar questions to above apply, but please
>don't add them just because you can, but only if you really need them (they
>can't be handled in hw instructions and promotion to SFmode and conversion
>afterwards is undesirable and you actually have code that proves it emits
>those calls).  Again, they should be only enabled on arches which ask for it
>(and/or sub-ABIs) and they need to symbol version stuff resolved.
>
>       Jakub 
Thank Jakub for a detailed analysis of this issue.

This mentioned issues that I had not considered before, such as:
symbol versions, their impact on all architectures, riscv architecture 
variants, and so on.

Your analysis has expanded my knowledge, and I will seek better solutions to 
this problem in my free time.

Thank you again, Jakub .

Thanks
Xiao Zeng

Reply via email to