tra added inline comments.

================
Comment at: lib/Basic/Targets.cpp:161
+    case CudaArch::GFX902:
+      return "320";
+    case CudaArch::UNKNOWN:
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > yaxunl wrote:
> > > > > tra wrote:
> > > > > > Unless you're planning to guarantee 1:1 match to functionality 
> > > > > > provided by nvidia's sm_32, it would be prudent to use some other 
> > > > > > value for the macro so the source code has a way to tell these GPUs 
> > > > > > apart.
> > > > > > 
> > > > > > Another issue with this approach is that typical use pattern for 
> > > > > > __CUDA_ARCH__ is 
> > > > > > `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be 
> > > > > > able to maintain order across GPU architectures among NVIDIA and 
> > > > > > AMD GPUs. Perhaps for HIP compilation it would make more sense to 
> > > > > > define __CUDA_ARCH__ as 1 (this should serve as a legacy indication 
> > > > > > of device-side compilation) and define __HIP_ARCH__ to indicate 
> > > > > > which AMD GPU we're compiling for without accidentally enabling 
> > > > > > something that was intended for NVIDIA's GPUs only.
> > > > > I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can 
> > > > > make that change.
> > > > > 
> > > > > On the other hand, I think it may be difficult to define 
> > > > > `__HIP_ARCH__` which can sort mixed nvptx/amdgcn GPU's by capability. 
> > > > > I do think a well defined `__HIP_ARCH__` would be useful for users. 
> > > > > Just need some further discussion how to define it.
> > > > > 
> > > > > For now, if there are specific codes for nvptx, it can continue use 
> > > > > `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can check 
> > > > > predefined amdgpu canonical names, e.g. `__gfx803__`, etc. 
> > > > OK.
> > > > 
> > > I asked Ben Sander about whether we can define __HIP_ARCH__, here is his 
> > > answer:
> > > 
> > > HIP targets a broader set of hardware than just a single vendor so 
> > > additional flexibility in defining feature capability is required.  The 
> > > HIP_ARCH_ macro provide per-feature-granularity mechanism to query 
> > > features.  Also the code tends to be more clear as opposed to an "if 
> > > __CUDA_ARCH>3 ..assume some feature".
> > > 
> > > For example
> > > 
> > > 
> > > ```
> > > // 32-bit Atomics:
> > > #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1)
> > > #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1)
> > > #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1)
> > > #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1)
> > > #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1)
> > > 
> > > // 64-bit Atomics:
> > > #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1)
> > > #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0)
> > > 
> > > // Doubles
> > > #define __HIP_ARCH_HAS_DOUBLES__ (1)
> > > 
> > > // warp cross-lane operations:
> > > #define __HIP_ARCH_HAS_WARP_VOTE__ (1)
> > > #define __HIP_ARCH_HAS_WARP_BALLOT__ (1)
> > > #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1)
> > > #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0)
> > > 
> > > // sync
> > > #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1)
> > > #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0)
> > > 
> > > // misc
> > > #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0)
> > > #define __HIP_ARCH_HAS_3DGRID__ (1)
> > > #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0)
> > > ```
> > I assume that will be handled somewhere else -- different patch, different 
> > place.
> > Looks like setting `__CUDA_ARCH__` to 1 is all that should be done here.
> > 
> > While we're looking a this, is CUDA compatibility one of your goals? I.e. 
> > do you expect existing CUDA code to be compilable and functional on AMD 
> > hardware? If not, then, perhaps you don't need `__CUDA_*__` defines at all.
> CUDA code needs to be translated to HIP code since the host API is different. 
> In most cases the translation can be done by script automatically. 
> `__CUDA_ARCH__` cannot be automatically translated because it is not portable 
> to non-nvptx devices, however it is often used to indicate device 
> compilation. Therefore we still need to define it in HIP to indicate device 
> compilation. In this way, CUDA programs using `__CUDA_ARCH__` just for 
> checking device compilation can be automatically translated. If users want to 
> use features associated with specific `__CUDA_ARCH__` they can manually 
> modify the translated code to use `__HIP_ARCH_HAS_*` macros.
It sounds like this translation is a one-time offline process and  HIP-mode 
compiler is not going to see any non-HIP code. If that's the case, I'm not 
quite sure I see the need for defining `__CUDA_ARCH__` in HIP mode -- 
translation process should've converted the CUDA-specific macro in the original 
code to it's HIP equivalent or get user to part it to something HIP can deal 
with.

HIP programming guide also says that `__CUDA_ARCH__` is [[ 
https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_porting_guide.md#compiler-defines-summary
 | undefined ]] by hcc. 


Repository:
  rC Clang

https://reviews.llvm.org/D45277



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to