yaxunl added inline comments.

================
Comment at: lib/Basic/Targets.cpp:161
+    case CudaArch::GFX902:
+      return "320";
+    case CudaArch::UNKNOWN:
----------------
tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > Unless you're planning to guarantee 1:1 match to functionality provided 
> > > by nvidia's sm_32, it would be prudent to use some other value for the 
> > > macro so the source code has a way to tell these GPUs apart.
> > > 
> > > Another issue with this approach is that typical use pattern for 
> > > __CUDA_ARCH__ is 
> > > `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be able to 
> > > maintain order across GPU architectures among NVIDIA and AMD GPUs. 
> > > Perhaps for HIP compilation it would make more sense to define 
> > > __CUDA_ARCH__ as 1 (this should serve as a legacy indication of 
> > > device-side compilation) and define __HIP_ARCH__ to indicate which AMD 
> > > GPU we're compiling for without accidentally enabling something that was 
> > > intended for NVIDIA's GPUs only.
> > I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can make that 
> > change.
> > 
> > On the other hand, I think it may be difficult to define `__HIP_ARCH__` 
> > which can sort mixed nvptx/amdgcn GPU's by capability. I do think a well 
> > defined `__HIP_ARCH__` would be useful for users. Just need some further 
> > discussion how to define it.
> > 
> > For now, if there are specific codes for nvptx, it can continue use 
> > `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can check 
> > predefined amdgpu canonical names, e.g. `__gfx803__`, etc. 
> OK.
> 
I asked Ben Sander about whether we can define __HIP_ARCH__, here is his answer:

HIP targets a broader set of hardware than just a single vendor so additional 
flexibility in defining feature capability is required.  The HIP_ARCH_ macro 
provide per-feature-granularity mechanism to query features.  Also the code 
tends to be more clear as opposed to an "if __CUDA_ARCH>3 ..assume some 
feature".

For example


```
// 32-bit Atomics:
#define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1)
#define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1)
#define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1)
#define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1)
#define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1)

// 64-bit Atomics:
#define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1)
#define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0)

// Doubles
#define __HIP_ARCH_HAS_DOUBLES__ (1)

// warp cross-lane operations:
#define __HIP_ARCH_HAS_WARP_VOTE__ (1)
#define __HIP_ARCH_HAS_WARP_BALLOT__ (1)
#define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1)
#define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0)

// sync
#define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1)
#define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0)

// misc
#define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0)
#define __HIP_ARCH_HAS_3DGRID__ (1)
#define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0)
```


Repository:
  rC Clang

https://reviews.llvm.org/D45277



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to