[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Steffen Larsen via Phabricator via cfe-commits Fri, 09 Apr 2021 06:33:36 -0700

steffenlarsen added a comment.

@tra  Thank you for the feedback! I think I see what you're getting at, but I 
am not quite understanding how it would work for these builtins and intrinsics. 
I have added some comments to the corresponding


The comment about `IntrInaccessibleMemOnly` I agree with completely, so I will 
replace `IntrNoMem` with it in the updated version I'm getting ready. Good 
call. :)



================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
tra wrote:
> Instead of creating one builtin per integer variant, can we use a more 
> generic builtin `__nvvm_redux_sync_add_i`, similar to how we handle 
> `__nvvm_atom_add_gen_i` ?
> 
What gives me pause is that a for atomic minimum there are both 
`__nvvm_atom_min_gen_i` and `__nvvm_atom_min_gen_ui` to distinguish between 
signed and unsigned. What makes the difference?

That noted, I'll happily rename the builtins to be more in line with the other 
builtins. `__nvvm_redux_sync_*_i` and `__nvvm_redux_sync_*_ui` maybe?


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:4103
+// redux.sync.add.u32 dst, src, membermask;
+def int_nvvm_redux_sync_add_u32 : GCCBuiltin<"__nvvm_redux_sync_add_u32">,
+  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
----------------
tra wrote:
> This could also be consolidated into an overloaded intrinsic operating on 
> `llvm_anyint_ty`
I am having a look at other uses of this, but I'm having difficulty wrapping my 
head around how to map these overloads to the PTX instructions in 
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td. Though it would be nice, it just 
seems overly complicated to get a signed and an unsigned 32-bit integer version 
of each of these intrinsics.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Reply via email to