[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Steffen Larsen via Phabricator via cfe-commits Thu, 22 Apr 2021 02:07:03 -0700

steffenlarsen added a comment.

> Do you know if any existing code already uses the __nvvm_* builtins for 
> cp.async? In other words, does nvcc provide them already or is it something 
> we're free to name as we wish? I do not see any relevant intrinsics mentioned 
> in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I 
> don't think NVCC's builtins are publicly documented anywhere.


I don't know of any yet. We will be using these in the relatively near future, 
but we can still change them no problem. However, the intrinsic and builtin 
naming for NVVM and NVPTX seems a bit inconsistent so it may be a long 
discussion (or maybe not.)



================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
tra wrote:
> steffenlarsen wrote:
> > tra wrote:
> > > steffenlarsen wrote:
> > > > tra wrote:
> > > > > steffenlarsen wrote:
> > > > > > tra wrote:
> > > > > > > Instead of creating one builtin per integer variant, can we use a 
> > > > > > > more generic builtin `__nvvm_redux_sync_add_i`, similar to how we 
> > > > > > > handle `__nvvm_atom_add_gen_i` ?
> > > > > > > 
> > > > > > What gives me pause is that a for atomic minimum there are both 
> > > > > > `__nvvm_atom_min_gen_i` and `__nvvm_atom_min_gen_ui` to distinguish 
> > > > > > between signed and unsigned. What makes the difference?
> > > > > > 
> > > > > > That noted, I'll happily rename the builtins to be more in line 
> > > > > > with the other builtins. `__nvvm_redux_sync_*_i` and 
> > > > > > `__nvvm_redux_sync_*_ui` maybe?
> > > > > > What gives me pause is that a for atomic minimum there are both 
> > > > > > __nvvm_atom_min_gen_i and __nvvm_atom_min_gen_ui to distinguish 
> > > > > > between signed and unsigned. What makes the difference?
> > > > > 
> > > > > Good point. We do not need unsigned variant for `add`.  We do need 
> > > > > explicit signed and unsigned variants ad LLVM IR integer types do not 
> > > > > take signedness into account, and the underlying min/max instructions 
> > > > > do.  Maybe, rename min_i/min_ui -> min/umin as LLVM does with 
> > > > > atomics? 
> > > > > 
> > > > > We may skip the `_i` suffix on logical ops as they only apply to 
> > > > > integers anyways.
> > > > > 
> > > > Sorry, I completely missed your responses.
> > > > 
> > > > > Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics?
> > > > 
> > > > Sounds good to me. Would there also be umax and uadd?
> > > > 
> > > > > We may skip the _i suffix on logical ops as they only apply to 
> > > > > integers anyways.
> > > > 
> > > > Absolutely. I'll make that happen! 
> > > > Would there also be umax and uadd?
> > > 
> > > You will need `umax`, but there's no need for `uadd` as 2-complement 
> > > addition is the same for signed/unsigned.
> > > 
> > > E.g `umax(0xffffffff, 1) -> 0xffffffff`, `max(-1,1) -> 1`, give different 
> > > answers, but `uadd(0xffffffff, 1) -> 0` and `add(-1,1) -> 0`.
> > Ah, of course. Though I do wonder as to the motivation of having signed and 
> > unsigned add variants in PTX. I'll drop the unsigned variant.
> It's for uniformity sake, I guess. All arithmetic ops in PTX operate on 
> sXX/uXX arguments, though not all of them have to.
> 
I bet you're right. Thanks for the help. 😄 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Reply via email to