lebedev.ri added a comment. In D99152#2655357 <https://reviews.llvm.org/D99152#2655357>, @fhahn wrote:
> In D99152#2655304 <https://reviews.llvm.org/D99152#2655304>, @lebedev.ri > wrote: > >> In D99152#2655274 <https://reviews.llvm.org/D99152#2655274>, @fhahn wrote: >> >>> In D99152#2649520 <https://reviews.llvm.org/D99152#2649520>, @LuoYuanke >>> wrote: >>> >>>> In D99152#2647681 <https://reviews.llvm.org/D99152#2647681>, @fhahn wrote: >>>> >>>>> I can't see any `load <256 x i32>` in the linked example, just a store. >>>>> Could you check the example? >>>> >>>> I create another example at https://gcc.godbolt.org/z/v6od5ceEz. In bar() >>>> function, you can see the `load <256 x i32>*` in the IR. The bar() >>>> function is actually buggy code, because tilec is not initialized by amx >>>> intrinsics. We want user call amx intrinsic to load/store tile explicitly. >>>> Ideally front-end can detect the issue and report error. >>> >>> Thanks AFAIK in those cases the conversion intrinsic makes sense to use, >>> because you effectively need to convert between 2 types in a non-trivial >>> way. @lebedev.ri WDYT? >> >> I'm not sure. I think first and foremost the `load`/`store` miscompile >> should be addressed. >> I think the rest is confusing because it seems to me that the only reason >> why that `bitcast` >> is needed is not correctness reason, but as an opaque optimization barrier. > > I'm not sure if the loads and store are actually incorrect. `_tile1024i ` is > defined as `typedef int _tile1024i __attribute__((__vector_size__(1024), > __aligned__(64)));` and the loads/stores are for assignments/reads from > variables that have that type, which is `<256 x i32>` in IR. So it is not > obvious to me why the loads/stores would be wrong, as long as `_tile1024i` is > defined as it is (if it would be a different type, that all changes). Didn't @LuoYuanke state: In D99152#2643821 <https://reviews.llvm.org/D99152#2643821>, @LuoYuanke wrote: > @lebedev.ri, this patch is mainly for discussing the approach that Florian > proposed, so I didn't polish my code. Nevertheless your comments for > amx_cast.c is right. For __tile_loadd() is to load a 2d tile from memory. > There is an extra parameter stride. As I explain in llvm-dev, it load each > row from memory to tile register and then base += stride. So the data is not > contiguous in memory. Note the `So the data is not contiguous in memory.`. I.e. we won't actually store to `i8* [ptr + 0, ptr + 4096)` byte area. Is plain `load`/`store` not defined to perform operations on contigious chunks of memory? Am i completely missing the point? > As a consequence, `__builtin_ia32_tilezero_internal` & the other builtins > need to be defined as returning `_tile1024i` / `<256 x i32>`. I don't think > there's any other way to specify this, unless you have a dedicated AMX type > in the frontend. IIUC the current lowering is to call an intrinsic that > returns `x86_amx` and then a `bitcast` is used for the conversion to the > result type `<256 x i32>`, with the (incorrect) assumption that the `bitcast` > does complex conversion between types. Another consequence of the builtins > returning `_tile1024i` / `<256 x i32>` is that the conversion from the > intrinsic result to `<256 x i32>` should happen at the place where Clang > emits the call to the intrinsic directly, not patched up later as it is done > now. /`store` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D99152/new/ https://reviews.llvm.org/D99152 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits