andrew.w.kaylor added a comment.

In D120395#3356355 <https://reviews.llvm.org/D120395#3356355>, @pengfei wrote:

> Good question! This is actually the scope of ABI. Unfortunately, we don't 
> have the BF16 ABI at the present. We can't assume what are the physical 
> registers the arguments been passed and returned before we have such a 
> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
> operations and passes and returns arguments by integer registers. When we 
> enabling some ISA set whose type doesn't have ABI representation, e.g., F16C, 
> we borrowed such conception. And as a trade off, we used integer rather than 
> introducing a new IR type, since we don't need to support the arithmetic 
> operations.

I don't see the point of the ARM soft-float comparison, given that X86 doesn't 
have the strict distinction between integer and floating point registers that 
ARM has, at least not for the XMM/YMM/ZMM registers. Consider the following 
code:

  __m128bh foo(__m128 x) {
    return _mm_cvtneps_pbh(x);
  }
  __m128 bar(__m128bh x) {
    return _mm_cvtpbh_ps(x);
  }

Currently, both clang and gcc will use XMM0 for the argument and return value 
in both functions. Is XMM0 an integer register or a floating point register? 
There is no such distinction. It's true that the x86_64 psABI does talk about 
the general purpose registers as integer registers, and both clang and gcc will 
use one of these registers for `__bfloat16` values, but that's an 
implementation detail (and a dubious one, considering that nearly anything 
useful that you can do with a `__bfloat16` will require moving it into an SSE 
register).

Also, you say we can't assume what registers will be used (in the eventual 
ABI?) but we are assuming exactly that. If the ABI is ever defined differently 
than what clang and gcc are currently doing, they will both be wrong.

But all of this only applies to the backend code generation. It has very little 
to do with the intrinsic definition in the header file or the IR generated by 
the front end. If we continue to define `__bfloat16` as an `unsigned short` in 
the header file, the front end will treat it as an `unsigned short` and it will 
use its rules for `unsigned short` to generate IR. If the ABI is ever defined 
to treat BF16 differently than `unsigned short`, the front end won't be able to 
do anything about that because we've told the front end that the value is an 
unsigned short.

On the other hand, if we define the `__bfloat16` type as the built-in `__bf16` 
type, then the front end can apply whatever rules it has for that type, 
including adding whatever ABI handling is needed for BF16 values. If that ends 
up being the same as the rules for `unsigned short`, that's no problem. The 
front end can implement it that way. If it ends up being something different, 
the front end can apply rules for whatever the alternative is. The point is, by 
telling the front end that this is a BF16 value, we allow the front end to 
control the semantics for it. This would then result in the front end 
generating IR using `bfloat` as the type for BF16 values. Again, this is a 
correct and accurate description of the value. It allows the optimizer to 
reason about it correctly in any way it needs to.

I don't see why we would treat BF16 values as `unsigned short` and `i16` 
throughout the compiler just to make the backend implementation easier when we 
already have types available for BF16.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to