craig.topper added a comment.

In D120395#3358453 <https://reviews.llvm.org/D120395#3358453>, @andrew.w.kaylor 
wrote:

> In D120395#3356355 <https://reviews.llvm.org/D120395#3356355>, @pengfei wrote:
>
>> Good question! This is actually the scope of ABI. Unfortunately, we don't 
>> have the BF16 ABI at the present. We can't assume what are the physical 
>> registers the arguments been passed and returned before we have such a 
>> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
>> operations and passes and returns arguments by integer registers. When we 
>> enabling some ISA set whose type doesn't have ABI representation, e.g., 
>> F16C, we borrowed such conception. And as a trade off, we used integer 
>> rather than introducing a new IR type, since we don't need to support the 
>> arithmetic operations.
>
> I don't see the point of the ARM soft-float comparison, given that X86 
> doesn't have the strict distinction between integer and floating point 
> registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider 
> the following code:
>
>   __m128bh foo(__m128 x) {
>     return _mm_cvtneps_pbh(x);
>   }
>   __m128 bar(__m128bh x) {
>     return _mm_cvtpbh_ps(x);
>   }
>
> Currently, both clang and gcc will use XMM0 for the argument and return value 
> in both functions. Is XMM0 an integer register or a floating point register? 
> There is no such distinction. It's true that the x86_64 psABI does talk about 
> the general purpose registers as integer registers, and both clang and gcc 
> will use one of these registers for `__bfloat16` values, but that's an 
> implementation detail (and a dubious one, considering that nearly anything 
> useful that you can do with a `__bfloat16` will require moving it into an SSE 
> register).
>
> Also, you say we can't assume what registers will be used (in the eventual 
> ABI?) but we are assuming exactly that. If the ABI is ever defined 
> differently than what clang and gcc are currently doing, they will both be 
> wrong.
>
> But all of this only applies to the backend code generation. It has very 
> little to do with the intrinsic definition in the header file or the IR 
> generated by the front end. If we continue to define `__bfloat16` as an 
> `unsigned short` in the header file, the front end will treat it as an 
> `unsigned short` and it will use its rules for `unsigned short` to generate 
> IR. If the ABI is ever defined to treat BF16 differently than `unsigned 
> short`, the front end won't be able to do anything about that because we've 
> told the front end that the value is an unsigned short.
>
> On the other hand, if we define the `__bfloat16` type as the built-in 
> `__bf16` type, then the front end can apply whatever rules it has for that 
> type, including adding whatever ABI handling is needed for BF16 values. If 
> that ends up being the same as the rules for `unsigned short`, that's no 
> problem. The front end can implement it that way. If it ends up being 
> something different, the front end can apply rules for whatever the 
> alternative is. The point is, by telling the front end that this is a BF16 
> value, we allow the front end to control the semantics for it. This would 
> then result in the front end generating IR using `bfloat` as the type for 
> BF16 values. Again, this is a correct and accurate description of the value. 
> It allows the optimizer to reason about it correctly in any way it needs to.
>
> I don't see why we would treat BF16 values as `unsigned short` and `i16` 
> throughout the compiler just to make the backend implementation easier when 
> we already have types available for BF16.

__m256bh should not have been a new type. It should have been an alias of 
__m256i. We don't have load/store intrinsics for __m256bh so if you can even 
get the __m256bh type in and out of memory using load/store intrinsics, it is 
only because we allow lax vector conversion by default. 
-fno-lax-vector-conversions will probably break any code trying to load/store 
it using a load/store intrinsic. If __m256bh was made a struct as at one point 
proposed, this would have been broken.

If we want __m256bh to be a unique type using __bf16, we must define load, 
store, and cast intrinsics for it. We would probably want insert/extract 
element intrinsics as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to