fhahn added a comment.


> Front-end alloca <256 x i32> for the local variable tile. When the return 
> value of __builtin_ia32_tileloadd64_internal is assigned to tile. Front-end 
> bitcast x86_amx to <256 x i32>. The x86_amx is the type returned from 
> __builtin_ia32_tileloadd64_internal.

Can you share a more interesting example, where the result of the load is 
actually used by a different AMX builtin? For the store example, it seems like 
conversion intrinsic + regular IR store should work.

>> With respect to the `load` issue, it is not clear to me at the moment under 
>> which circumstances regular `load` instructions are generated & interact 
>> with AMX. If `load` is used to load `x` consecutive elements, than that's 
>> fine. But if the actual intended operation is a strided load, then `load` 
>> should not be used (this has also been discussed on llvm-dev).
>
> The `load` instructions are generated because it is a vector in C language. 
> See https://gcc.godbolt.org/z/qv5jnjK48. If we use -O0, there is load 
> instruction generated. If we use -O2, the load instruction is eliminated. The 
> -O2 version is what we want. There is no <256 x i32> in the generated code.

I can't see any `load <256 x i32>` in the linked example, just a store. Could 
you check the example?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to