[llvm-bugs] [Bug 120836] [Clang] [NVPTX] [Windows] Compilation errors with CUDA NVPTX backend and MSVC headers

LLVM Bugs via llvm-bugs Sat, 21 Dec 2024 05:50:58 -0800

Issue	120836
Summary	[Clang] [NVPTX] [Windows] Compilation errors with CUDA NVPTX backend and MSVC headers
Labels	clang
Assignees
Reporter	blinkfrog

    **Describe the bug**

When compiling CUDA code with Clang's NVPTX backend on Windows using MSVC headers, multiple errors occur related to the `__builtin_va_list` type not being compatible with MSVC's `va_list`. This appears to be a mismatch in how Clang and MSVC handle `va_list` when compiling CUDA device code.


**To Reproduce**

Steps to reproduce the behavior:

1. Use the following simple CUDA program (`test.cu`):

```
#include <iostream>
#include <cuda_runtime.h>

__global__ void addKernel(int *c, const int *a, const int *b) {
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

int main() {
    const int arraySize = 5;
    int a[arraySize] = {1, 2, 3, 4, 5};
    int b[arraySize] = {10, 20, 30, 40, 50};
    int c[arraySize] = {0};

    int *dev_a = nullptr, *dev_b = nullptr, *dev_c = nullptr;

    cudaMalloc((void**)&dev_a, arraySize * sizeof(int));
    cudaMalloc((void**)&dev_b, arraySize * sizeof(int));
    cudaMalloc((void**)&dev_c, arraySize * sizeof(int));

    cudaMemcpy(dev_a, a, arraySize * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy(dev_b, b, arraySize * sizeof(int), cudaMemcpyHostToDevice);

    addKernel<<<1, arraySize>>>(dev_c, dev_a, dev_b);

    cudaMemcpy(c, dev_c, arraySize * sizeof(int), cudaMemcpyDeviceToHost);

    std::cout << "Results: ";
    for (int i = 0; i < arraySize; ++i) {
        std::cout << c[i] << " ";
    }
    std::cout << std::endl;

    cudaFree(dev_a);
    cudaFree(dev_b);
    cudaFree(dev_c);

    return 0;
}
```

2. Compile the code using Clang with the following command (adjust paths according to your environment):

```
clang++ -std=c++14 --cuda-gpu-arch=sm_75 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" -lcudart_static -ldl -lrt -pthread test.cu -o test.exe
```

3. Observe the following errors (truncated):

```
In file included from <built-in>:1:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_runtime_wrapper.h:472:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_cmath.h:16:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\limits:12:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cwchar:11:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cstdio:11:
In file included from C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\stdio.h:13:
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\corecrt_wstdio.h:486:24: error: non-const lvalue
      reference to type '__builtin_va_list' cannot bind to a value of unrelated type 'va_list' (aka 'char *')
  486 |         __crt_va_start(_ArgList, _Locale);
      |                        ^~~~~~~~
C:\llvm\lib\clang\19\include\vadefs.h:39:54: note: expanded from macro '__crt_va_start'
   39 | #define __crt_va_start(ap, param) __builtin_va_start(ap, param)
      |                                                      ^~
In file included from <built-in>:1:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_runtime_wrapper.h:472:
In file included from C:\llvm\lib\clang\19\include\__clang_cuda_cmath.h:16:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\limits:12:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cwchar:11:
In file included from C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\include\cstdio:11:
In file included from C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\stdio.h:13:
C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt\corecrt_wstdio.h:488:22: error: non-const lvalue
      reference to type '__builtin_va_list' cannot bind to a value of unrelated type 'va_list' (aka 'char *')
  488 |         __crt_va_end(_ArgList);
      |                      ^~~~~~~~
(truncated)
```

**Expected behavior**

 The program should compile without errors related to standard library headers.

**Observed behavior**

 Compilation fails with the above `va_list` errors, indicating a mismatch between Clang’s built-in types and MSVC’s definitions in CUDA device mode.

**Environment:**

LLVM Clang version: 19.1.6
CUDA toolkit version: 12.4
Visual Studio version: 2022 Community Edition 17.12.3
Windows SDK version: 10.0.22621.0
Command used for compilation:
```
clang++ -std=c++14 --cuda-gpu-arch=sm_75 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" -lcudart_static -ldl -lrt -pthread test.cu -o test.exe
```

**Additional context**

These errors suggest a compatibility issue between Clang’s NVPTX backend and the MSVC headers when used in device compilation. This blocks our development with Clang + CUDA on Windows. Any guidance or fixes would be greatly appreciated.

Thank you for looking into this issue.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 120836] [Clang] [NVPTX] [Windows] Compilation errors with CUDA NVPTX backend and MSVC headers

Reply via email to