fhahn added a comment.
> Front-end alloca <256 x i32> for the local variable tile. When the return > value of __builtin_ia32_tileloadd64_internal is assigned to tile. Front-end > bitcast x86_amx to <256 x i32>. The x86_amx is the type returned from > __builtin_ia32_tileloadd64_internal. Can you share a more interesting example, where the result of the load is actually used by a different AMX builtin? For the store example, it seems like conversion intrinsic + regular IR store should work. >> With respect to the `load` issue, it is not clear to me at the moment under >> which circumstances regular `load` instructions are generated & interact >> with AMX. If `load` is used to load `x` consecutive elements, than that's >> fine. But if the actual intended operation is a strided load, then `load` >> should not be used (this has also been discussed on llvm-dev). > > The `load` instructions are generated because it is a vector in C language. > See https://gcc.godbolt.org/z/qv5jnjK48. If we use -O0, there is load > instruction generated. If we use -O2, the load instruction is eliminated. The > -O2 version is what we want. There is no <256 x i32> in the generated code. I can't see any `load <256 x i32>` in the linked example, just a store. Could you check the example? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99152/new/ https://reviews.llvm.org/D99152 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits