Issue 172937
Summary [clang][CUDA] Add support for `__CUDA_ARCH_LIST__` macro
Labels clang
Assignees
Reporter bernhardmgruber
    Hi! I am a maintainer of the CUDA Core Compute Libraries ([CCCL](https://github.com/NVIDIA/cccl)). We provide high performing algorithms as part of the Thrust, CUB and libcu++ (our fork of libc++) libraries, targetting NVIDIA GPUs. While our main support is towards nvcc and nvc++ as compilers, we also test clang in CUDA mode in our CI and are aware of some user groups consuming our libraries using clang.

During recent [refactorings](https://github.com/NVIDIA/cccl/pull/6914), we noticed that we cannot perform some compile-time optimizations  that we can do for nvcc, which boil down to limiting the amount of code to compile (templates to instantiate) based on the SM architectures we are compiling for. nvcc offers this information via the [`__CUDA_ARCH_LIST__` macro](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#virtual-architecture-macros), which is a comma separates list like `800,860,900`, one for each value that `__CUDA_ARCH__` will have during the device compilation passes. nvc++ offers a similar macro with `NV_TARGET_SM_INTEGER_LIST`.

If clang in CUDA mode could provide this macro, we would be able to perform the same compile-time optimizations as we do for nvcc, improving compile times for clang in CUDA mode. Please consider adding such a macro. 
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to