================
@@ -145,17 +145,21 @@ _DEFAULT_FN_ATTRS static __inline__ void 
__gpu_sync_lane(uint64_t __lane_mask) {
 
 // Shuffles the the lanes inside the wavefront according to the given index.
 _DEFAULT_FN_ATTRS static __inline__ uint32_t
-__gpu_shuffle_idx_u32(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
-  return __builtin_amdgcn_ds_bpermute(__idx << 2, __x);
+__gpu_shuffle_idx_u32(uint64_t __lane_mask, uint32_t __idx, uint32_t __x,
+                      uint32_t __width) {
+  uint32_t __lane = __idx + (__gpu_lane_id() & ~(__width - 1));
----------------
jhuber6 wrote:

I was thinking about how to document this, but I couldn't find anywhere in 
`clang` that did usage documentation for things like this. I suppose at a 
minimum I could put usage headers on everything, but for now I figured it'd 
best to just be consistent with `__shfl` and figured that was enough.

https://github.com/llvm/llvm-project/pull/125896
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to