[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

Andy Kaylor via Phabricator via cfe-commits Mon, 07 Mar 2022 14:31:05 -0800

andrew.w.kaylor added a comment.

In D120395#3358533 <https://reviews.llvm.org/D120395#3358533>, @craig.topper 
wrote:


> __m256bh should not have been a new type. It should have been an alias of 
> __m256i. We don't have load/store intrinsics for __m256bh so if you can even 
> get the __m256bh type in and out of memory using load/store intrinsics, it is 
> only because we allow lax vector conversion by default. 
> -fno-lax-vector-conversions will probably break any code trying to load/store 
> it using a load/store intrinsic. If __m256bh was made a struct as at one 
> point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, 
> store, and cast intrinsics for it. We would probably want insert/extract 
> element intrinsics as well.

From this, it sounds like our intrinsics support is incomplete for this type. 
Even if it is defined as an alias of some existing type (such as __m256i), I 
would have to do the mental gymnastics of mixing and matching intrinsics to get 
the behavior I want. I think this gets back to the question @scanon asked about 
the semantics of this type. What should I be able to do with it? Can I load a 
vector of these values from memory? Can I store them to memory? Can I assign 
one vector to another? Can I pass it as an argument? Can I use it as a return 
type? It looks the only things we have intrinsics for (for the vector types) 
are converting to and from single precision vectors and performing the dot 
product accumulate operation.

I was wondering about similar issues when I was looking at what we had for the 
__bfloat16 type. We have intrinsics to convert this type to and from single 
precision floating point values, but I can't do anything else with it. Nothing 
else at all, including inserting it into a vector of bf16 values.

So @pengfei is trying to press ahead with the backend implementation, but our 
front end support is incomplete. That might explain why Phoebe and I haven't 
been able to agree on what should be done here.

This patch is strictly a front end patch, but it's trying to just wedge 
definitions into header files to get the desired outcome in the code 
generation. From the user's perspective, it feels totally broken.

Consider this function.

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
    // Load the vectors using the integer load intrinsics??
    __mm128i temp1 = _mm_loadu_epi32(p1);
    __mm128i temp2 = _mm_loadu_epi32(p2);
  
    // Zero-initialize the a base value vector
    __mm128 base = _mm_set_ps1(0.0f);
  
    // Perform the dot product
    return _mm_dpbf16_ps (base, temp1, temp2);
  }

Is what you'd expect with the current definitions? It looks like it produces 
the instructions I expected, but with -fno-lax-vector-conversions I get an 
error unless I explicitly bitcast the arguments from `__m128i` to `__m128bh`.

I think that just brings me up to speed with what Craig was saying, right?

So at this point we have these options:

1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
2. Deprecate the `__m[128|256|512]bh` types and replace them with 
`__m[128|256|512]i`
3. Add load/store/insert/extract intrinsics for the `__bfloat16` type

Of these, I'd prefer the third option because both of the first two require the 
an overloaded use of the vector-integer type. I already don't like that we use 
the same type for any size integer vector. Using it for BF16 vectors just seems 
wrong.

For the example above, I'd like to write code like this:

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
    // Load the BF16 vectors
    __mm128bh v1 = _mm_load_pbh(p1);
    __mm128bh v2 = _mm_load_pbh(p2);
  
    // Zero-initialize the a base value vector
    __mm128 base = _mm_set_ps1(0.0f);
  
    // Perform the dot product
    return _mm_dpbf16_ps (base, v1, v2);
  }

That's more work, but it has the merit of allowing me to use types that match 
what the program is doing.

The fact that you can't pass a __m128i value to a function that is expecting 
__m128bh is a good thing. We shouldn't be making changes that prevents this 
diagnostic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

Reply via email to