tra added inline comments.

================
Comment at: clang/tools/nvptx-arch/CMakeLists.txt:19
+if (NOT CUDA_FOUND OR NOT cuda-library)
+  message(STATUS "Not building nvptx-arch: cuda runtime not found")
+  return()
----------------
Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , 
It has nothing to do with the CUDA runtime.
Here, it's actually not even the libcuda.so itself that's not found, but it's 
stub. 
I think a sensible error here should say "Failed to find stubs/libcuda.so in 
CUDA_LIBDIR"


================
Comment at: clang/tools/nvptx-arch/CMakeLists.txt:25
+
+set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
+target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
----------------
Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs?

This should not be necessary. The stub shipped with CUDA comes as "libcuda.so" 
only. It's SONAME is libcuda.so.1, but there's no symlink with that name in 
stubs, so RPATH pointing there will do nothing. At runtime, dynamic linker will 
attempt to open libcuda.so.1 and it will only be found among the actual 
libraries installed by NVIDIA drivers.




================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:26
+#if !CUDA_HEADER_FOUND
+int main() { return 1; }
+#else
----------------
How do we distinguish "we didn't have CUDA at build time" reported here from 
"some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ?



================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:34
+  const char *ErrStr = nullptr;
+  CUresult Result = cuGetErrorString(Err, &ErrStr);
+  if (Result != CUDA_SUCCESS)
----------------
One problem with this approach is that `nvptx-arch` will fail to run on a 
machine without NVIDIA drivers installed because dynamic linker will not find 
`libcuda.so.1`.

Ideally we want it to run on any machine and fail the way we want.

A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the 
pointers to the functions we need via `dlsym()`.




================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:63
+
+    printf("sm_%d%d\n", Major, Minor);
+  }
----------------
jhuber6 wrote:
> tianshilei1992 wrote:
> > Do we want to include device number here?
> For `amdgpu-arch` and here we just have it implicitly in the order, so the 
> n-th line is the n-th device, i.e.
> ```
> sm_70 // device 0
> sm_80 // device 1
> sm_70 // device 2
> ```
NVIDIA GPU enumeration order is more or less arbitrary. By default it's 
arranged by "sort of fastest GPU first", but can be rearranged in order of 
PCI(e) bus IDs or in an arbitrary user-specified order using 
`CUDA_VISIBLE_DEVICES`. Printing compute capability in the enumeration order is 
pretty much all the user needs.  If we want to print something uniquely 
identifying the device, we would need to pring the device UUID, similarly to 
what `nvidia-smi -L` does. Or PCIe bus IDs. In other words -- we can uniquely 
identify devices, but there's no such thing as inherent canonical order among 
the devices.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140433/new/

https://reviews.llvm.org/D140433

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to