================ @@ -962,6 +962,109 @@ The ``griddepcontrol`` intrinsics allows the dependent grids and prerequisite gr For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__. +TCGEN05 family of Intrinsics +---------------------------- + +The llvm.nvvm.tcgen05.* intrinsics model the TCGEN05 family of instructions +exposed by PTX. These intrinsics use 'Tensor Memory' (henceforth ``tmem``). +NVPTX represents this memory using ``addrspace(6)`` and is always 32-bits. + +For more information, refer PTX ISA +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory>`_. + +The tensor-memory pointers may only be used with the tcgen05 intrinsics. +There are specialized load/store instructions provided (tcgen05.ld/st) to +work with tensor-memory. + +For more information on tensor-memory load/store instructions, refer +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-and-register-load-store-instructions>`_. + +All tcgen05 intrinsics use a ``null`` pointer in tmem address +space as their last operand. This helps to preserve ordering among the tcgen05 +operations especially when the intrinsic lacks any tmem operands. This +last operand is dropped during Codegen. ---------------- Artem-B wrote:
Wouldn't it be sufficient to just use `IntrInaccessibleMemOnly` and/or `IntrInaccessibleMemOrArgMemOnly` ? They seem to match your use case fairly well. Relative ordering of `tcgen05` intrinsics should be preserved, but they will not interfere with loads/stores for other instructions. https://github.com/llvm/llvm-project/pull/124961 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits