tra added inline comments.
================ Comment at: lib/Basic/Targets.cpp:161 + case CudaArch::GFX902: + return "320"; + case CudaArch::UNKNOWN: ---------------- yaxunl wrote: > tra wrote: > > yaxunl wrote: > > > tra wrote: > > > > Unless you're planning to guarantee 1:1 match to functionality provided > > > > by nvidia's sm_32, it would be prudent to use some other value for the > > > > macro so the source code has a way to tell these GPUs apart. > > > > > > > > Another issue with this approach is that typical use pattern for > > > > __CUDA_ARCH__ is > > > > `#if __CUDA_ARCH__ >= XXX`. I don't expect that we'll always be able to > > > > maintain order across GPU architectures among NVIDIA and AMD GPUs. > > > > Perhaps for HIP compilation it would make more sense to define > > > > __CUDA_ARCH__ as 1 (this should serve as a legacy indication of > > > > device-side compilation) and define __HIP_ARCH__ to indicate which AMD > > > > GPU we're compiling for without accidentally enabling something that > > > > was intended for NVIDIA's GPUs only. > > > I think let `__CUDA_ARCH__`==1 for amdgcn is reasonable and I can make > > > that change. > > > > > > On the other hand, I think it may be difficult to define `__HIP_ARCH__` > > > which can sort mixed nvptx/amdgcn GPU's by capability. I do think a well > > > defined `__HIP_ARCH__` would be useful for users. Just need some further > > > discussion how to define it. > > > > > > For now, if there are specific codes for nvptx, it can continue use > > > `__CUDA_ARCH__`. If there are specific codes for amdgcn, it can check > > > predefined amdgpu canonical names, e.g. `__gfx803__`, etc. > > OK. > > > I asked Ben Sander about whether we can define __HIP_ARCH__, here is his > answer: > > HIP targets a broader set of hardware than just a single vendor so additional > flexibility in defining feature capability is required. The HIP_ARCH_ macro > provide per-feature-granularity mechanism to query features. Also the code > tends to be more clear as opposed to an "if __CUDA_ARCH>3 ..assume some > feature". > > For example > > > ``` > // 32-bit Atomics: > #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1) > #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1) > #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1) > #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1) > #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (1) > > // 64-bit Atomics: > #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1) > #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0) > > // Doubles > #define __HIP_ARCH_HAS_DOUBLES__ (1) > > // warp cross-lane operations: > #define __HIP_ARCH_HAS_WARP_VOTE__ (1) > #define __HIP_ARCH_HAS_WARP_BALLOT__ (1) > #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1) > #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0) > > // sync > #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (1) > #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0) > > // misc > #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0) > #define __HIP_ARCH_HAS_3DGRID__ (1) > #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0) > ``` I assume that will be handled somewhere else -- different patch, different place. Looks like setting `__CUDA_ARCH__` to 1 is all that should be done here. While we're looking a this, is CUDA compatibility one of your goals? I.e. do you expect existing CUDA code to be compilable and functional on AMD hardware? If not, then, perhaps you don't need `__CUDA_*__` defines at all. Repository: rC Clang https://reviews.llvm.org/D45277 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits