craig.topper added a comment. In D120395#3358533 <https://reviews.llvm.org/D120395#3358533>, @craig.topper wrote:
> In D120395#3358453 <https://reviews.llvm.org/D120395#3358453>, > @andrew.w.kaylor wrote: > >> In D120395#3356355 <https://reviews.llvm.org/D120395#3356355>, @pengfei >> wrote: >> >>> Good question! This is actually the scope of ABI. Unfortunately, we don't >>> have the BF16 ABI at the present. We can't assume what are the physical >>> registers the arguments been passed and returned before we have such a >>> hardware. For example, ARM has soft FP ABI that supports FP arithmetic >>> operations and passes and returns arguments by integer registers. When we >>> enabling some ISA set whose type doesn't have ABI representation, e.g., >>> F16C, we borrowed such conception. And as a trade off, we used integer >>> rather than introducing a new IR type, since we don't need to support the >>> arithmetic operations. >> >> I don't see the point of the ARM soft-float comparison, given that X86 >> doesn't have the strict distinction between integer and floating point >> registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider >> the following code: >> >> __m128bh foo(__m128 x) { >> return _mm_cvtneps_pbh(x); >> } >> __m128 bar(__m128bh x) { >> return _mm_cvtpbh_ps(x); >> } >> >> Currently, both clang and gcc will use XMM0 for the argument and return >> value in both functions. Is XMM0 an integer register or a floating point >> register? There is no such distinction. It's true that the x86_64 psABI does >> talk about the general purpose registers as integer registers, and both >> clang and gcc will use one of these registers for `__bfloat16` values, but >> that's an implementation detail (and a dubious one, considering that nearly >> anything useful that you can do with a `__bfloat16` will require moving it >> into an SSE register). >> >> Also, you say we can't assume what registers will be used (in the eventual >> ABI?) but we are assuming exactly that. If the ABI is ever defined >> differently than what clang and gcc are currently doing, they will both be >> wrong. >> >> But all of this only applies to the backend code generation. It has very >> little to do with the intrinsic definition in the header file or the IR >> generated by the front end. If we continue to define `__bfloat16` as an >> `unsigned short` in the header file, the front end will treat it as an >> `unsigned short` and it will use its rules for `unsigned short` to generate >> IR. If the ABI is ever defined to treat BF16 differently than `unsigned >> short`, the front end won't be able to do anything about that because we've >> told the front end that the value is an unsigned short. >> >> On the other hand, if we define the `__bfloat16` type as the built-in >> `__bf16` type, then the front end can apply whatever rules it has for that >> type, including adding whatever ABI handling is needed for BF16 values. If >> that ends up being the same as the rules for `unsigned short`, that's no >> problem. The front end can implement it that way. If it ends up being >> something different, the front end can apply rules for whatever the >> alternative is. The point is, by telling the front end that this is a BF16 >> value, we allow the front end to control the semantics for it. This would >> then result in the front end generating IR using `bfloat` as the type for >> BF16 values. Again, this is a correct and accurate description of the value. >> It allows the optimizer to reason about it correctly in any way it needs to. >> >> I don't see why we would treat BF16 values as `unsigned short` and `i16` >> throughout the compiler just to make the backend implementation easier when >> we already have types available for BF16. > > __m256bh should not have been a new type. It should have been an alias of > __m256i. We don't have load/store intrinsics for __m256bh so if you can even > get the __m256bh type in and out of memory using load/store intrinsics, it is > only because we allow lax vector conversion by default. > -fno-lax-vector-conversions will probably break any code trying to load/store > it using a load/store intrinsic. If __m256bh was made a struct as at one > point proposed, this would have been broken. > > If we want __m256bh to be a unique type using __bf16, we must define load, > store, and cast intrinsics for it. We would probably want insert/extract > element intrinsics as well. -fno-lax-vector-conversions does indeed break the load/store intrinsics https://godbolt.org/z/b5WPj84Pa Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120395/new/ https://reviews.llvm.org/D120395 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits