[llvm-bugs] [Bug 163904] MLIR NVGPU / NVVM TMA operations (cp.async.bulk.tensor.*) don't support signed integer coordinates

LLVM Bugs via llvm-bugs Sat, 18 Oct 2025 02:28:27 -0700

Issue	163904
Summary	MLIR NVGPU / NVVM TMA operations (cp.async.bulk.tensor.*) don't support signed integer coordinates
Labels	mlir
Assignees
Reporter	jwake

    I'm working on some MLIR codegen that needs to be able to generate occasional negative coordinates for a TMA multidimensional global -> shared memory copy, and I've noticed that the MLIR NVGPU dialect, the NVVM dialect, and the underlying LLVM NVPTX intrinsics for the TMA bulk copy instructions all require unsigned integers for the coordinates, contrary to the PTX documentation stating that the coordinates are all 32-bit signed ints (with the shared->global direction requiring that none of the components are negative)


It looks like `nvcc` just treats the values as unsigned ints in the PTX, eg.:
```cuda
__global__ void kernel(const __grid_constant__ CUtensorMap tensor_map, int x) {
    int y = -123;
...
 cde::cp_async_bulk_tensor_2d_global_to_shared(&smem_buffer, &tensor_map, x, y, bar);
```
becomes
```ptx
.visible .entry kernel(CUtensorMap_st, unsigned int)(
	.param .align 64 .b8 kernel(CUtensorMap_st, int)_param_0[128],
	.param .u32 kernel(CUtensorMap_st, int)_param_1,
)
...
	ld.param.u32 	%r4, [kernel(CUtensorMap_st, int)_param_1];
...
	mov.u32 	%r11, -123;
...
	cp.async.bulk.tensor.2d.shared::cluster.global.tile.mbarrier::complete_tx::bytes [%r10], [%rd8, {%r4, %r11}], [%r13];
```
so for now I'll just generate a similar bit cast in my MLIR lowering.

It seems like at the MLIR level at least, `nvgpu.tma.async.load`/`nvvm.cp.async.bulk.tensor.shared.cluster.global` should accept `si32`, `i32` and `index` while the corresponding store instructions remain unsigned, unless I'm missing something?

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 163904] MLIR NVGPU / NVVM TMA operations (cp.async.bulk.tensor.*) don't support signed integer coordinates

Reply via email to