lebedev.ri added a comment.

In D99152#2655357 <https://reviews.llvm.org/D99152#2655357>, @fhahn wrote:

> In D99152#2655304 <https://reviews.llvm.org/D99152#2655304>, @lebedev.ri 
> wrote:
>
>> In D99152#2655274 <https://reviews.llvm.org/D99152#2655274>, @fhahn wrote:
>>
>>> In D99152#2649520 <https://reviews.llvm.org/D99152#2649520>, @LuoYuanke 
>>> wrote:
>>>
>>>> In D99152#2647681 <https://reviews.llvm.org/D99152#2647681>, @fhahn wrote:
>>>>
>>>>> I can't see any `load <256 x i32>` in the linked example, just a store. 
>>>>> Could you check the example?
>>>>
>>>> I create another example at https://gcc.godbolt.org/z/v6od5ceEz. In bar() 
>>>> function, you can see the `load <256 x i32>*` in the IR. The bar() 
>>>> function is actually buggy code, because tilec is not initialized by amx 
>>>> intrinsics. We want user call amx intrinsic to load/store tile explicitly. 
>>>> Ideally front-end can detect the issue and report error.
>>>
>>> Thanks AFAIK in those cases the conversion intrinsic makes sense to use, 
>>> because you effectively need to convert between 2 types in a non-trivial 
>>> way. @lebedev.ri WDYT?
>>
>> I'm not sure. I think first and foremost the `load`/`store` miscompile 
>> should be addressed.
>> I think the rest is confusing because it seems to me that the only reason 
>> why that `bitcast`
>> is needed is not correctness reason, but as an opaque optimization barrier.
>
> I'm not sure if the loads and store are actually incorrect. `_tile1024i ` is 
> defined as `typedef int _tile1024i __attribute__((__vector_size__(1024), 
> __aligned__(64)));` and the loads/stores are for assignments/reads from 
> variables that have that type, which is `<256 x i32>` in IR. So it is not 
> obvious to me why the loads/stores would be wrong, as long as `_tile1024i` is 
> defined as it is (if it would be a different type, that all changes).

Didn't @LuoYuanke state:

In D99152#2643821 <https://reviews.llvm.org/D99152#2643821>, @LuoYuanke wrote:

> @lebedev.ri, this patch is mainly for discussing the approach that Florian 
> proposed, so I didn't polish my code. Nevertheless your comments for 
> amx_cast.c is right. For __tile_loadd() is to load a 2d tile from memory. 
> There is an extra parameter stride. As I explain in llvm-dev, it load each 
> row from memory to tile register and then base += stride. So the data is not 
> contiguous in memory.

Note the `So the data is not contiguous in memory.`.
I.e. we won't actually store to `i8* [ptr + 0, ptr + 4096)` byte area.
Is plain `load`/`store` not defined to perform operations on contigious chunks 
of memory?
Am i completely missing the point?

> As a consequence,  `__builtin_ia32_tilezero_internal` & the other builtins 
> need to be defined as returning  `_tile1024i` / `<256 x i32>`. I don't think 
> there's any other way to specify this, unless you have a dedicated AMX type 
> in the frontend. IIUC the current lowering is to call an intrinsic that 
> returns `x86_amx` and then a `bitcast` is used for the conversion to the 
> result type `<256 x i32>`, with the (incorrect) assumption that the `bitcast` 
> does complex conversion between types. Another consequence of the builtins 
> returning `_tile1024i` / `<256 x i32>` is that the conversion from the 
> intrinsic result to `<256 x i32>` should happen at the place where Clang 
> emits the call to the intrinsic directly, not patched up later as it is done 
> now.

/`store`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to