2024-08-13 15:53 Jakub Jelinek <ja...@redhat.com> wrote: > >On Tue, Aug 13, 2024 at 11:14:47AM +0800, Xiao Zeng wrote: >> Thank you very much for the in-depth discussion between Jakub Jelinek and >> jeff. >> My knowledge is narrow, and I am not familiar with architectures other than >> RISCV. >> At the same time, my understanding of libraries such as libc and libm is >> also shallow. >> >> I spent some time sorting out my thoughts, which resulted in slow email >> replies. I am very sorry. > >The important thing is that the current state of BF16 support on other >architectures is what we want there, not more. So any changes done for >RISCV shouldn't affect the other architectures, that wasn't the case of >the patch you've posted. >E.g. on x86_64, for FP16 we have: >__divhc3@@GCC_12.0.0 >__eqhf2@@GCC_12.0.0 >__extendhfdf2@@GCC_12.0.0 >__extendhfsf2@@GCC_12.0.0 >__extendhftf2@@GCC_12.0.0 >__extendhfxf2@@GCC_12.0.0 >__fixhfti@@GCC_12.0.0 >__fixunshfti@@GCC_12.0.0 >__floatbitinthf@@GCC_14.0.0 >__floattihf@@GCC_12.0.0 >__floatuntihf@@GCC_12.0.0 >__mulhc3@@GCC_12.0.0 >__nehf2@@GCC_12.0.0 >__truncdfhf2@@GCC_12.0.0 >__trunchfbf2@@GCC_13.0.0 >__truncsfhf2@@GCC_12.0.0 >__trunctfhf2@@GCC_12.0.0 >__truncxfhf2@@GCC_12.0.0 >exported from libgcc, while for BF16 just: >__extendbfsf2@@GCC_13.0.0 >__floatbitintbf@@GCC_14.0.0 >__floattibf@@GCC_13.0.0 >__floatuntibf@@GCC_13.0.0 >__truncdfbf2@@GCC_13.0.0 >__trunchfbf2@@GCC_13.0.0 >__truncsfbf2@@GCC_13.0.0 >__trunctfbf2@@GCC_13.0.0 >__truncxfbf2@@GCC_13.0.0 >More attention has been paid to what we actually need there, which is >primarily conversions to/from other types (but even not to all of them, with >some changes on the RTL expression lowering side to make sure we use the >SFmode arithmetics as much as possible and only have the really required >stuff on the libgcc side. >We don't want to change that, if you really need __mulbc3/__divbc3 on RISCV, >then it should be added for that arch only. And similarly, the choice >of the builtins on the compiler side, the two builtins we have right now is >all we want on the other arches. So, further builtins would be either a >matter of RISCV specific builtins, or in generic code but guarded by some >target hook so that they aren't enabled on arches which don't want them. >On the libstdc++ side, the current headers provide for std::bfloat16_t and >std::float16_t an implementation which uses SFmode calculations where >possible, so stuff like: > constexpr _Float16 > acos(_Float16 __x) > { return _Float16(__builtin_acosf(__x)); } >or > constexpr __gnu_cxx::__bfloat16_t > acos(__gnu_cxx::__bfloat16_t __x) > { return __gnu_cxx::__bfloat16_t(__builtin_acosf(__x)); } >And for printing, note there is >_ZSt20__to_chars_float16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 >_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31 >_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 >_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31 >which input and output _Float16 and __bf16, but in the parameter passing >they expect those types to be promoted to float, so that the ABIs aren't >dependent on when a particular arch enables those types. > >For RISCV, the things to consider are, what is the _Float16 and __bf16 >function argument passing/returning ABI? Is the type enabled on all >variants of RISCV, or just some (e.g. regarding _Float16 and __bf16 >on i686-linux, there is support for it only if the SSE2 ISA is available, >so e.g. the *[hb][fc]* functions in libgcc need to be compiled with >-msse2 extra flag)? If it can be passed/returned the same in all ABIs, >what excess precision mode do you want to use on them? I mean e.g. the >TARGET_C_EXCESS_PRECISION target hook. On e.g. x86_64, the default >is to promote all _Float16 and __bf16 calculations to float, so if you have >__bf16 a, b, c, d, e; >... >a = b * c + d - e + c * d; >all variables are converted to SFmode temporaries and all the arithmetics >is done in SFmode and only then at the end finally converted to HFmode >or BFmode. One can request a different mode, -fexcess-precision=16 >in which such promotion isn't done, but as there is no hw support for >most of the operations, the actual multiplication, addition or subtraction >is still done in SFmode, just there is a conversion to BFmode after each >operation (so slower, but more precise). >If you still want to export __divbc3 and __mulbc3, do you want to export >those just on some RISCV ABI variants or all of them? Depending on that, >arrange for those to be compiled just for those; and, if it is exported >from libgcc_s.so.1, you also need to add a symbol version for those, likely >GCC_15.0.0. > >For enabling just those 2 functions, I don't think you need any changes on >the builtins.def etc. side, those aren't builtins but libcalls. > >If you need other libgcc calls, similar questions to above apply, but please >don't add them just because you can, but only if you really need them (they >can't be handled in hw instructions and promotion to SFmode and conversion >afterwards is undesirable and you actually have code that proves it emits >those calls). Again, they should be only enabled on arches which ask for it >(and/or sub-ABIs) and they need to symbol version stuff resolved. > > Jakub Thank Jakub for a detailed analysis of this issue.
This mentioned issues that I had not considered before, such as: symbol versions, their impact on all architectures, riscv architecture variants, and so on. Your analysis has expanded my knowledge, and I will seek better solutions to this problem in my free time. Thank you again, Jakub . Thanks Xiao Zeng