yaxunl added inline comments.
================ Comment at: lib/Basic/Targets.cpp:161 + case CudaArch::GFX902: + return "320"; + case CudaArch::UNKNOWN: ---------------- tra wrote: > yaxunl wrote: > > tra wrote: > > > yaxunl wrote: > > > > tra wrote: > > > > > yaxunl wrote: > > > > > > tra wrote: > > > > > > > Unless you're planning to guarantee 1:1 match to functionality > > > > > > > provided by nvidia's sm_32, it would be prudent to use some other > > > > > > > value for the macro so the source code has a way to tell these > > > > > > > GPUs apart. > > > > > > > > > > > > > > Another issue with this approach is that typical use pattern for > > > > > > > __CUDA_ARCH__ is > > > > > > > `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be > > > > > > > able to maintain order across GPU architectures among NVIDIA and > > > > > > > AMD GPUs. Perhaps for HIP compilation it would make more sense to > > > > > > > define __CUDA_ARCH__ as 1 (this should serve as a legacy > > > > > > > indication of device-side compilation) and define __HIP_ARCH__ to > > > > > > > indicate which AMD GPU we're compiling for without accidentally > > > > > > > enabling something that was intended for NVIDIA's GPUs only. > > > > > > I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can > > > > > > make that change. > > > > > > > > > > > > On the other hand, I think it may be difficult to define > > > > > > `__HIP_ARCH__` which can sort mixed nvptx/amdgcn GPU's by > > > > > > capability. I do think a well defined `__HIP_ARCH__` would be > > > > > > useful for users. Just need some further discussion how to define > > > > > > it. > > > > > > > > > > > > For now, if there are specific codes for nvptx, it can continue use > > > > > > `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can > > > > > > check predefined amdgpu canonical names, e.g. `__gfx803__`, etc. > > > > > OK. > > > > > > > > > I asked Ben Sander about whether we can define __HIP_ARCH__, here is > > > > his answer: > > > > > > > > HIP targets a broader set of hardware than just a single vendor so > > > > additional flexibility in defining feature capability is required. The > > > > HIP_ARCH_ macro provide per-feature-granularity mechanism to query > > > > features. Also the code tends to be more clear as opposed to an "if > > > > __CUDA_ARCH>3 ..assume some feature". > > > > > > > > For example > > > > > > > > > > > > ``` > > > > // 32-bit Atomics: > > > > #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1) > > > > #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1) > > > > #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1) > > > > #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1) > > > > #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1) > > > > > > > > // 64-bit Atomics: > > > > #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1) > > > > #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0) > > > > > > > > // Doubles > > > > #define __HIP_ARCH_HAS_DOUBLES__ (1) > > > > > > > > // warp cross-lane operations: > > > > #define __HIP_ARCH_HAS_WARP_VOTE__ (1) > > > > #define __HIP_ARCH_HAS_WARP_BALLOT__ (1) > > > > #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1) > > > > #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0) > > > > > > > > // sync > > > > #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1) > > > > #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0) > > > > > > > > // misc > > > > #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0) > > > > #define __HIP_ARCH_HAS_3DGRID__ (1) > > > > #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0) > > > > ``` > > > I assume that will be handled somewhere else -- different patch, > > > different place. > > > Looks like setting `__CUDA_ARCH__` to 1 is all that should be done here. > > > > > > While we're looking a this, is CUDA compatibility one of your goals? I.e. > > > do you expect existing CUDA code to be compilable and functional on AMD > > > hardware? If not, then, perhaps you don't need `__CUDA_*__` defines at > > > all. > > CUDA code needs to be translated to HIP code since the host API is > > different. In most cases the translation can be done by script > > automatically. `__CUDA_ARCH__` cannot be automatically translated because > > it is not portable to non-nvptx devices, however it is often used to > > indicate device compilation. Therefore we still need to define it in HIP to > > indicate device compilation. In this way, CUDA programs using > > `__CUDA_ARCH__` just for checking device compilation can be automatically > > translated. If users want to use features associated with specific > > `__CUDA_ARCH__` they can manually modify the translated code to use > > `__HIP_ARCH_HAS_*` macros. > It sounds like this translation is a one-time offline process and HIP-mode > compiler is not going to see any non-HIP code. If that's the case, I'm not > quite sure I see the need for defining `__CUDA_ARCH__` in HIP mode -- > translation process should've converted the CUDA-specific macro in the > original code to it's HIP equivalent or get user to part it to something HIP > can deal with. > > HIP programming guide also says that `__CUDA_ARCH__` is [[ > https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_porting_guide.md#compiler-defines-summary > | undefined ]] by hcc. Sorry I missed that. I will revert the change about macro `__CUDA_ARCH__` and define `__HIP_DEVICE_COMPILE__` instead. Thanks. Repository: rC Clang https://reviews.llvm.org/D45277 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits