craig.topper added a comment.

In D120395#3358533 <https://reviews.llvm.org/D120395#3358533>, @craig.topper 
wrote:

> In D120395#3358453 <https://reviews.llvm.org/D120395#3358453>, 
> @andrew.w.kaylor wrote:
>
>> In D120395#3356355 <https://reviews.llvm.org/D120395#3356355>, @pengfei 
>> wrote:
>>
>>> Good question! This is actually the scope of ABI. Unfortunately, we don't 
>>> have the BF16 ABI at the present. We can't assume what are the physical 
>>> registers the arguments been passed and returned before we have such a 
>>> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
>>> operations and passes and returns arguments by integer registers. When we 
>>> enabling some ISA set whose type doesn't have ABI representation, e.g., 
>>> F16C, we borrowed such conception. And as a trade off, we used integer 
>>> rather than introducing a new IR type, since we don't need to support the 
>>> arithmetic operations.
>>
>> I don't see the point of the ARM soft-float comparison, given that X86 
>> doesn't have the strict distinction between integer and floating point 
>> registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider 
>> the following code:
>>
>>   __m128bh foo(__m128 x) {
>>     return _mm_cvtneps_pbh(x);
>>   }
>>   __m128 bar(__m128bh x) {
>>     return _mm_cvtpbh_ps(x);
>>   }
>>
>> Currently, both clang and gcc will use XMM0 for the argument and return 
>> value in both functions. Is XMM0 an integer register or a floating point 
>> register? There is no such distinction. It's true that the x86_64 psABI does 
>> talk about the general purpose registers as integer registers, and both 
>> clang and gcc will use one of these registers for `__bfloat16` values, but 
>> that's an implementation detail (and a dubious one, considering that nearly 
>> anything useful that you can do with a `__bfloat16` will require moving it 
>> into an SSE register).
>>
>> Also, you say we can't assume what registers will be used (in the eventual 
>> ABI?) but we are assuming exactly that. If the ABI is ever defined 
>> differently than what clang and gcc are currently doing, they will both be 
>> wrong.
>>
>> But all of this only applies to the backend code generation. It has very 
>> little to do with the intrinsic definition in the header file or the IR 
>> generated by the front end. If we continue to define `__bfloat16` as an 
>> `unsigned short` in the header file, the front end will treat it as an 
>> `unsigned short` and it will use its rules for `unsigned short` to generate 
>> IR. If the ABI is ever defined to treat BF16 differently than `unsigned 
>> short`, the front end won't be able to do anything about that because we've 
>> told the front end that the value is an unsigned short.
>>
>> On the other hand, if we define the `__bfloat16` type as the built-in 
>> `__bf16` type, then the front end can apply whatever rules it has for that 
>> type, including adding whatever ABI handling is needed for BF16 values. If 
>> that ends up being the same as the rules for `unsigned short`, that's no 
>> problem. The front end can implement it that way. If it ends up being 
>> something different, the front end can apply rules for whatever the 
>> alternative is. The point is, by telling the front end that this is a BF16 
>> value, we allow the front end to control the semantics for it. This would 
>> then result in the front end generating IR using `bfloat` as the type for 
>> BF16 values. Again, this is a correct and accurate description of the value. 
>> It allows the optimizer to reason about it correctly in any way it needs to.
>>
>> I don't see why we would treat BF16 values as `unsigned short` and `i16` 
>> throughout the compiler just to make the backend implementation easier when 
>> we already have types available for BF16.
>
> __m256bh should not have been a new type. It should have been an alias of 
> __m256i. We don't have load/store intrinsics for __m256bh so if you can even 
> get the __m256bh type in and out of memory using load/store intrinsics, it is 
> only because we allow lax vector conversion by default. 
> -fno-lax-vector-conversions will probably break any code trying to load/store 
> it using a load/store intrinsic. If __m256bh was made a struct as at one 
> point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, 
> store, and cast intrinsics for it. We would probably want insert/extract 
> element intrinsics as well.

-fno-lax-vector-conversions does indeed break the load/store intrinsics 
https://godbolt.org/z/b5WPj84Pa


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to