[PATCH] D114326: Update the list of CUDA versions up to 11.5

Carlos Galvez via Phabricator via cfe-commits Thu, 25 Nov 2021 01:13:16 -0800

carlosgalvezp accepted this revision.
carlosgalvezp added a comment.

LGTM but I think @tra should have the final word.




================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
tra wrote:
> carlosgalvezp wrote:
> > mojca wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > mojca wrote:
> > > > > > carlosgalvezp wrote:
> > > > > > > mojca wrote:
> > > > > > > > tra wrote:
> > > > > > > > > tra wrote:
> > > > > > > > > > mojca wrote:
> > > > > > > > > > > kadircet wrote:
> > > > > > > > > > > > looks like the list is getting big and hard to 
> > > > > > > > > > > > maintain. considering that this is done only once per 
> > > > > > > > > > > > compiler invocation (and we check for existence of 
> > > > > > > > > > > > directories down in the loop anyway). what about 
> > > > > > > > > > > > throwing in an extra directory listing to 
> > > > > > > > > > > > base-directories mentioned down below and populate 
> > > > > > > > > > > > `Candidates` while preserving the newest-version-first 
> > > > > > > > > > > > order?
> > > > > > > > > > > I totally agree with the sentiment, and that was my 
> > > > > > > > > > > initial thought as well, but with zero experience I was 
> > > > > > > > > > > too scared to make any more significant changes.
> > > > > > > > > > > 
> > > > > > > > > > > I can try to come up with a new patch (that doesn't need 
> > > > > > > > > > > further maintenance whenever a new CUDA version gets 
> > > > > > > > > > > released) if that's what you are suggesting. I would 
> > > > > > > > > > > nevertheless merge this one, and prepare a new more 
> > > > > > > > > > > advanced patch separately, but that's finally your call.
> > > > > > > > > > > 
> > > > > > > > > > > What's your suggestion about D.SysRoot + "Program 
> > > > > > > > > > > Files/..."? At the time when this function gets called it 
> > > > > > > > > > > looks like D.SysRoot is empty (or at least my debugger 
> > > > > > > > > > > says so) and in my case it resolves to D: while the CUDA 
> > > > > > > > > > > support is installed under C:.
> > > > > > > > > > > 
> > > > > > > > > > > Is there any special LLVM-specific/preferrable way to 
> > > > > > > > > > > iterate through directories?
> > > > > > > > > > > 
> > > > > > > > > > > (What I also miss a bit in the whole process in an option 
> > > > > > > > > > > to simply say "I want CUDA 11.1" without the need to 
> > > > > > > > > > > explicitly spell out the full path.)
> > > > > > > > > > > 
> > > > > > > > > > > If you provide me give some general guidelines, I'll 
> > > > > > > > > > > prepare another, hopefully more future-proof patch.
> > > > > > > > > > > 
> > > > > > > > > > > (Side note: I'm not sure if I'm calling clang-format 
> > > > > > > > > > > correctly, but if I call it, it keeps reformatting the 
> > > > > > > > > > > rest of this file.)
> > > > > > > > > > This whole list may no longer be particularly useful. The 
> > > > > > > > > > most common use case on Linux, AFAICT, is to install only 
> > > > > > > > > > one CUDA version using system-provided package manager.
> > > > > > > > > > E.g. 
> > > > > > > > > > https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> > > > > > > > > > 
> > > > > > > > > > TBH, I'm tempted to limit autodetection to only that one 
> > > > > > > > > > system-default version and require user to use --cuda-path 
> > > > > > > > > > if they need something else.
> > > > > > > > > I think on windows (I mean the windows environment itself, 
> > > > > > > > > not WSL), CUDA installer sets an environment variable which 
> > > > > > > > > could be used to detect the default CUDA version, so it may 
> > > > > > > > > warrant a windows-specific way to find it. 
> > > > > > > > On Windows this is certainly not the case. Unless the 
> > > > > > > > installation is changed manually, one always gets the new 
> > > > > > > > version installed into a new directory.
> > > > > > > > 
> > > > > > > > I really do need multiple versions on Windows (and the ability 
> > > > > > > > to pick an older one) if I want to compile a binary that works 
> > > > > > > > on someone else's computer (if I compile against the latest 
> > > > > > > > CUDA, users need "the latest" drivers that may sometimes not 
> > > > > > > > even be available for their machine).
> > > > > > > > 
> > > > > > > > (That said, at least when using CMake, the selection needs to 
> > > > > > > > be done by CMake anyway, and maybe CMake could be forced to 
> > > > > > > > specify the correct flag automatically.)
> > > > > > > > 
> > > > > > > > So even if the functionality gets removed from non-Windows 
> > > > > > > > platforms, it would be really nice to keep it for Windows.
> > > > > > > > 
> > > > > > > > Now, there are three "conflicting" feedbacks/suggestions above. 
> > > > > > > > I can try to improve support, but it would be really helpful to 
> > > > > > > > reach the consensus of what needs to be done before coding it.
> > > > > > > > one always gets the new version installed into a new directory.
> > > > > > > A similar thing happens on Linux.
> > > > > > > 
> > > > > > > > users need "the latest" drivers
> > > > > > > Since CUDA 10.2, there's some "[[ 
> > > > > > > https://docs.nvidia.com/deploy/cuda-compatibility/ | 
> > > > > > > compatibility mode ]]" that allows to run newer CUDA on older 
> > > > > > > drivers. As long as you are not using the latest features, of 
> > > > > > > course.
> > > > > > > 
> > > > > > > I'm personally all up for removing redundancy and duplication. 
> > > > > > I'm following 
> > > > > > https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now 
> > > > > > and the NVIDIA's "official packages" for Ubuntu get installed under 
> > > > > > `/usr/local/cuda-11.x`.
> > > > > > 
> > > > > > That sounds significant enough to me to argue against the removal 
> > > > > > of versioned folders from search.
> > > > > > I'm following 
> > > > > > https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now 
> > > > > > and the NVIDIA's "official packages" for Ubuntu get installed under 
> > > > > > /usr/local/cuda-11.x.
> > > > > 
> > > > > OK, that's something that will be used often enough, so still need to 
> > > > > deal with versioned directories. :-(
> > > > > 
> > > > > > That sounds significant enough to me to argue against the removal 
> > > > > > of versioned folders from search.
> > > > > 
> > > > > I'm not against probing for multiple versions in principle.
> > > > > 
> > > > > What I'm saying is that:
> > > > > - blindly increasing the number of probed directories comes with a 
> > > > > price and should be done cautiously.
> > > > > - it may be a good time to revisit how we detect CUDA installations 
> > > > > and make sure it makes sense now.
> > > > > 
> > > > > There are two considerations.
> > > > > 
> > > > > First is the overhead it adds to compiler driver. While for most 
> > > > > users running locally it's negligible, it would be noticed for the 
> > > > > users who may have /usr/local mounted over NFS, which is not that 
> > > > > unusual in institutional environments. Another thing to consider that 
> > > > > it will be noticed by *all* such users, even if they don't use CUDA. 
> > > > > E.g. all C++ compilations will be probing for those directories.
> > > > > 
> > > > > The other issue is that we should differentiate between finding the 
> > > > > canonical location of CUDA SDK, vs picking one out of many CUDA SDK 
> > > > > versions that may be installed simultaneously. I'd argue that in case 
> > > > > of having multiple versions compiler has no business picking one 
> > > > > version over another (though we could make an attempt to use the most 
> > > > > recent one as we do now). It's up to the user to explicitly specify 
> > > > > the one they want. When we allow to pick one version out of many, it 
> > > > > makes it way too easy for a user to end up with a mix of the default 
> > > > > version and the version they want, all they need to do is to forget 
> > > > > `--cuda-path` somewhere in their build.
> > > > > 
> > > > > 
> > > > > For this patch, I propose to drop the 9.x and 8.x so we keep the 
> > > > > number of probed paths under control.
> > > > > Since CUDA 10.2, there's some "compatibility mode" that allows to run 
> > > > > newer CUDA on older drivers.
> > > > 
> > > > This only works with very specific versions of the drivers and those 
> > > > are not very common on the end-user machines. It's mostly for the 
> > > > datacenter use.
> > > > I think on windows (I mean the windows environment itself, not WSL), 
> > > > CUDA installer sets an environment variable which could be used to 
> > > > detect the default CUDA version, so it may warrant a windows-specific 
> > > > way to find it.
> > > 
> > > I see that I have
> > > ```
> > > CUDA_PATH       = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
> > > CUDA_PATH_V11_4 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
> > > CUDA_PATH_V11_3 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3
> > > ```
> > > etc. Using those directly might in fact be a good idea.
> > > 
> > > > The other issue is that we should differentiate between finding the 
> > > > canonical location of CUDA SDK, vs picking one out of many CUDA SDK 
> > > > versions that may be installed simultaneously.
> > > 
> > > Agreed. (I also wish clang would accept something like 
> > > `--cuda-version=11.4` that would automatically work on Windows and Linux 
> > > whenever CUDA resides under `/usr/local/cuda-x.y`)
> > > 
> > > I wanted to add another strange observation. Today I installed the latest 
> > > CUDA (11.5) and clang ("14") under Ubuntu 20.04 inside WSL 2 on Windows 
> > > 11 (equivalent to a regular Ubuntu, I would say). I had to explicitly 
> > > provide the `-L` flag despite using `--cuda-path` (else the linker fails) 
> > > which sounds relatively strange to me:
> > > ```
> > > clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart -L 
> > > /usr/local/cuda-11.5/lib64 hello.c -o hello
> > > ```
> > > 
> > > > First is the overhead it adds to compiler driver. While for most users 
> > > > running locally it's negligible, it would be noticed for the users who 
> > > > may have /usr/local mounted over NFS, which is not that unusual in 
> > > > institutional environments. Another thing to consider that it will be 
> > > > noticed by *all* such users, even if they don't use CUDA. E.g. all C++ 
> > > > compilations will be probing for those directories.
> > > 
> > > Are you saying that clang executes this portion of the code even when it 
> > > knows that it's not compiling *.cu files? Why does it need to know 
> > > anything at all about directories related to CUDA? (I'm curious, but 
> > > there's no real need to answer, it's outside the scope of this ticket ;)
> > > 
> > > > For this patch, I propose to drop the 9.x and 8.x so we keep the number 
> > > > of probed paths under control.
> > > 
> > > Do you want me to upload another patch that only keeps versions 10.0 up 
> > > to 11.5 on the list, and then open a new ticket in the tracker describing 
> > > what the rework should be, and potentially start working towards that 
> > > goal as a separate patch?
> > > It's mostly for the datacenter use.
> > I'm not sure about that. We have recently switched to CUDA 11.4 with old 
> > 460 drivers on regular desktop machines (CUDA 11.4 comes with 470 drivers) 
> > and everything works just fine (not using CUDA 11.4 features, naturally)
> > 
> > The docs specify the minimum version required:
> > CUDA 11.x   >= 450.80.02*
> > 
> > I think the "Enterprise" thing you are referring to only applies to the "3. 
> > Forward compatibility" section, not the "2. Minor version compatibility" 
> > section.
> > 
> > >  I had to explicitly provide the -L flag 
> > Yes, this is [[ https://llvm.org/docs/CompileCudaWithLLVM.html | documented 
> > ]]
> I'm not saying that cuda_compat is exclusively for the datacenter use, just 
> that the datacenter is the primary audience.
> 
> Regular users are not as constrained in updating the drivers to a more recent 
> version and are much more likely to use the drivers that are *not* supported 
> by cuda_compat.
> The end result is that in practice "upgrade your drivers" is much more likely 
> to work for most of the end users than "use cuda_compat" -- less moving parts 
> this way and the result is more predictable. 
> 
> > The docs specify the minimum version required: CUDA 11.x >= 450.80.02*
> 
> It's complicated. See compatibility between CUDA versions and various driver 
> versions:
> https://docs.nvidia.com/deploy/cuda-compatibility/#use-the-right-compat-package
> 
> It also has the downside of lack of interoperability with OpenGL, which may 
> be needed by some users.
> 
> Anyways, we've got way off topic here. 
> we've got way off topic here.

Agreed, apologies for that and thanks for the discussion!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114326/new/

https://reviews.llvm.org/D114326

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114326: Update the list of CUDA versions up to 11.5

Reply via email to