================
@@ -962,6 +962,109 @@ The ``griddepcontrol`` intrinsics allows the dependent 
grids and prerequisite gr
 For more information, refer 
 `PTX ISA 
<https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__.
 
+TCGEN05 family of Intrinsics
+----------------------------
+
+The llvm.nvvm.tcgen05.* intrinsics model the TCGEN05 family of instructions
+exposed by PTX. These intrinsics use 'Tensor Memory' (henceforth ``tmem``).
+NVPTX represents this memory using ``addrspace(6)`` and is always 32-bits.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory>`_.
+
+The tensor-memory pointers may only be used with the tcgen05 intrinsics.
+There are specialized load/store instructions provided (tcgen05.ld/st) to
+work with tensor-memory.
+
+For more information on tensor-memory load/store instructions, refer
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-and-register-load-store-instructions>`_.
+
+All tcgen05 intrinsics use a ``null`` pointer in tmem address
+space as their last operand. This helps to preserve ordering among the tcgen05
+operations especially when the intrinsic lacks any tmem operands. This
+last operand is dropped during Codegen.
----------------
Artem-B wrote:

Wouldn't it be sufficient to just use `IntrInaccessibleMemOnly` and/or 
`IntrInaccessibleMemOrArgMemOnly` ? They seem to match your use case fairly 
well. Relative ordering of `tcgen05` intrinsics should be preserved, but they 
will not interfere with loads/stores for other instructions.

https://github.com/llvm/llvm-project/pull/124961
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to