Re: [PATCH] D19990: [CUDA] Implement __ldg using intrinsics.

Justin Lebar via cfe-commits Thu, 05 May 2016 13:44:05 -0700

jlebar added inline comments.

================
Comment at: include/clang/Basic/BuiltinsNVPTX.def:569-603
@@ -568,1 +568,37 @@
 
+// __ldg.  This is not implemented as a builtin by nvcc.
+BUILTIN(__nvvm_ldg_c, "ccC*", "")
+BUILTIN(__nvvm_ldg_s, "ssC*", "")
+BUILTIN(__nvvm_ldg_i, "iiC*", "")
+BUILTIN(__nvvm_ldg_l, "LiLiC*", "")
+BUILTIN(__nvvm_ldg_ll, "LLiLLiC*", "")
+
+BUILTIN(__nvvm_ldg_uc, "UcUcC*", "")
+BUILTIN(__nvvm_ldg_us, "UsUsC*", "")
+BUILTIN(__nvvm_ldg_ui, "UiUiC*", "")
+BUILTIN(__nvvm_ldg_ul, "ULiULiC*", "")
+BUILTIN(__nvvm_ldg_ull, "ULLiULLiC*", "")
+
+BUILTIN(__nvvm_ldg_f, "ffC*", "")
+BUILTIN(__nvvm_ldg_d, "ddC*", "")
+
+BUILTIN(__nvvm_ldg_c2, "E2cE2cC*", "")
+BUILTIN(__nvvm_ldg_c4, "E4cE4cC*", "")
+BUILTIN(__nvvm_ldg_s2, "E2sE2sC*", "")
+BUILTIN(__nvvm_ldg_s4, "E4sE4sC*", "")
+BUILTIN(__nvvm_ldg_i2, "E2iE2iC*", "")
+BUILTIN(__nvvm_ldg_i4, "E4iE4iC*", "")
+BUILTIN(__nvvm_ldg_ll2, "E2LLiE2LLiC*", "")
+
+BUILTIN(__nvvm_ldg_uc2, "E2UcE2UcC*", "")
+BUILTIN(__nvvm_ldg_uc4, "E4UcE4UcC*", "")
+BUILTIN(__nvvm_ldg_us2, "E2UsE2UsC*", "")
+BUILTIN(__nvvm_ldg_us4, "E4UsE4UsC*", "")
+BUILTIN(__nvvm_ldg_ui2, "E2UiE2UiC*", "")
+BUILTIN(__nvvm_ldg_ui4, "E4UiE4UiC*", "")
+BUILTIN(__nvvm_ldg_ull2, "E2ULLiE2ULLiC*", "")
+
+BUILTIN(__nvvm_ldg_f2, "E2fE2fC*", "")
+BUILTIN(__nvvm_ldg_f4, "E4fE4fC*", "")
+BUILTIN(__nvvm_ldg_d2, "E2dE2dC*", "")
+
----------------
jlebar wrote:
> majnemer wrote:
> > Would it be crazy to instead provide a generic builtin? Would cut down on 
> > the number of variants...
> > 
> > `__builtin_add_overflow` is an example of such a builtin.
> Art is going to send you flowers.  :)  He and I just had an argument about 
> this.
> 
> I think this isn't an unreasonable thing to want, but I think it's beneficial 
> to be consistent with our existing API.  So if we offer a generic thing for 
> ldg, it would be nice to have one for atomics above, which are basically the 
> same.
> 
> So I told Art I'd prefer to add it to our list.
Oh, another thing is that, you really see the benefit of having a generic 
builtin when you start hitting the combinatorial explosion of all the different 
kinds of loads.  Like, as-is it's not so bad, but if you want to support all 
forms of ld.global.nc, there are four different caching behaviors.  Supporting 
all forms of ld is way worse.


Which is to say, if we're going to do the generic thing, it seems like we 
benefit the most by making it generic on more than the types.  But we're not 
ready to do that; I don't think most of these loads even exist in llvm atm.

http://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld


http://reviews.llvm.org/D19990



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: [PATCH] D19990: [CUDA] Implement __ldg using intrinsics.

Reply via email to