================
@@ -962,6 +962,109 @@ The ``griddepcontrol`` intrinsics allows the dependent 
grids and prerequisite gr
 For more information, refer 
 `PTX ISA 
<https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__.
 
+TCGEN05 family of Intrinsics
+----------------------------
+
+The llvm.nvvm.tcgen05.* intrinsics model the TCGEN05 family of instructions
+exposed by PTX. These intrinsics use 'Tensor Memory' (henceforth ``tmem``).
+NVPTX represents this memory using ``addrspace(6)`` and is always 32-bits.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory>`_.
+
+The tensor-memory pointers may only be used with the tcgen05 intrinsics.
+There are specialized load/store instructions provided (tcgen05.ld/st) to
+work with tensor-memory.
+
+For more information on tensor-memory load/store instructions, refer
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-and-register-load-store-instructions>`_.
+
+All tcgen05 intrinsics use a ``null`` pointer in tmem address
+space as their last operand. This helps to preserve ordering among the tcgen05
+operations especially when the intrinsic lacks any tmem operands. This
+last operand is dropped during Codegen.
----------------
durga4github wrote:

> After reading PTX docs here's my understanding of the situation.
> 
> * there's a new kind of memory, so creating a separate AS for tmem is 
> reasonable.
> * tcgen05.alloc returns allocation result indirectly, by storing it in a 
> shared memory. So LLVM has no direct indication that the intrinsic operates 
> on tmem and affects both shared memory and tmem
> * it's not clear from PTX docs what's the input for tcgen05.dealloc. It just 
> says "The operand taddr must point to a previous [Tensor 
> Memory](https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory)
>  allocation" but I can't tell if that means the previous location in the 
> shared memory where it stored a tmem pointer, or the tmem pointer itself. 
> Judging by the proposed intrinsic signature, it's the latter. In this case 
> LLVM knows that we're touching tmem.
> * relinquish_alloc_permit blocks subsequent allocations, so it must not be 
> reordered vs allocs.
> 
> So, the only odd thing is the allocation returning the result indirectly.
> 
> Proposed design adds artificial tmem pointer to let LLVM know that all 
> tcgen05 intrinsics operate on tmem and we can give LLVM sufficient hints on 
> how they should be ordered. However, the dummy argument is a crutch.
> 
> The gist of the problem here is that LLVM's existing intrinsic annotation is 
> not flexible enough to describe what we have here, exactly. I.e. there's no 
> way to tell LLVM that alloc and relinquish_alloc_permit operate on tmem. Our 
> current 

Yes, this is precisely what we meant to deal with.

options are to either make all intrinsics conservatively with `HasSideEffects` 
or, with a more relaxed "IntrInaccessibleMemOnly". I think the latter would be 
a reasonable trade-off for the time being.
> 

Sure. I have updated the alloc/relinq intrinsic to use InaccessibleMem* 
properties, in the latest revision.

> A longer-term approach would be to add a new intrinsic property allowing to 
> specify specific AS accessed by the intrinsic. E.g. we may extend existing 
> `IntrWriteMem` and `IntrWriteMem` to allow narrowing the scope to particular 
> AS, and allow specifying more than one. E.g. alloc would indicate that it 
> writes both shared and tmem. I think that would be a useful addition to a 
> handful of other intrinsics we already have, not just in NVPTX, but in the 
> other back-ends that need to deal with multiple AS.

Yes, cannot agree more on this..



https://github.com/llvm/llvm-project/pull/124961
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to