[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

Joseph Huber via Phabricator via cfe-commits Tue, 26 Apr 2022 12:09:23 -0700

jhuber6 added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224
   if (IsCuda || IsHIP) {
-    if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
+    if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) ||
+        Args.hasArg(options::OPT_foffload_new_driver))
       CmdArgs.push_back("-fgpu-rdc");
----------------
rnk wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > jhuber6 wrote:
> > > > > > tra wrote:
> > > > > > > If user specifies both `-fno-gpu-rdc` and `-foffload-new-driver` 
> > > > > > > we would still enable RDC compilation.
> > > > > > > We may want to at least issue a warning. 
> > > > > > > 
> > > > > > > Considering that  we have multiple places where we may check for 
> > > > > > > `-f[no]gpu-rdc` we should make sure we don't get different ideas 
> > > > > > > whether RDC has been enabled.
> > > > > > > I think it may make sense to provide a common way to figure it 
> > > > > > > out. Either via a helper function that would process CLI 
> > > > > > > arguments or calculate it once and save it somewhere.
> > > > > > I haven't quite finalized how to handle this. The new driver should 
> > > > > > be compatible with a non-RDC build since we simply wouldn't embed 
> > > > > > the device image or create offloading entries. It's a little bit 
> > > > > > more difficult here since the new method is opt-in so it requires a 
> > > > > > flag. We should definitely emit a warning if both are enabled (I'm 
> > > > > > assuming there's one for passing both `fgpu-rdc` and 
> > > > > > `fno-gpu-rdc`). I'll add one in.
> > > > > > 
> > > > > > Also we could consider the new driver *the* RDC in the future which 
> > > > > > would be the easiest. The problem is if we want to support CUDA's 
> > > > > > method of RDC considering how other build systems seem to expect 
> > > > > > it. I could see us embedding the fatbinary in the object file, even 
> > > > > > if unused, just so that cuobjdump works. However we couldn't 
> > > > > > support the generation of `__cudaRegisterFatBinary_nv....` 
> > > > > > functions because then those would cause linker errors. WDYT?
> > > > > > I'm assuming there's one for passing both fgpu-rdc and fno-gpu-rdc
> > > > > 
> > > > > This is not a valid assumption. The whole idea behind 
> > > > > `-fno-something` is that the options can be overridden. E.g. if the 
> > > > > build specifies a standard set of compiler options, but we need to 
> > > > > override some of them when building a particular file. We can only do 
> > > > > so by appending to the standard options. Potentially we may end up 
> > > > > having those options overridden again. While it's not very common, 
> > > > > it's certainly possible. It's also possible to start with 
> > > > > '-fno-gpu-rdc' and then override it with `-fgpu-rdc`.
> > > > > 
> > > > > In this case, we care about the final state of RDC specified by 
> > > > > -f*gpu-rdc options, not by the fact that `-fno-gpu-rdc` is present. 
> > > > > `Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, 
> > > > > false)` does exactly that -- gives you the final state. If it returns 
> > > > > false, but we have `-foffload-new-driver`, then we need a warning as 
> > > > > these options are contradictory.
> > > > > 
> > > > > The new driver should be compatible with a non-RDC build 
> > > > 
> > > > In that case, we don't need a warning, but we do need a test verifying 
> > > > this behavior.
> > > > 
> > > > > Also we could consider the new driver *the* RDC in the future which 
> > > > > would be the easiest. 
> > > > 
> > > > SGTM. I do not know how it all will work out in the end. Your proposed 
> > > > model makes a lot of sense, and I'm guardedly optimistic about it.
> > > > Eventually we would deprecate RDC options, but we still need to work 
> > > > sensibly when user specifies a mix of these options. 
> > > > 
> > > > 
> > > > In that case, we don't need a warning, but we do need a test verifying 
> > > > this behavior.
> > > > 
> > > It's possible but I don't have the logic here to do it, figured we can 
> > > cross that bridge later.
> > > 
> > > > SGTM. I do not know how it all will work out in the end. Your proposed 
> > > > model makes a lot of sense, and I'm guardedly optimistic about it.
> > > >
> > > So the only downsides I know of, is that we don't currently replicate 
> > > CUDA's magic to JIT RDC code (We can do this with LTO anyway), and that 
> > > registering offload entries relies on the linker defining `__start / 
> > > __stop` variables, which I don't know if linkers on Windows / MacOS 
> > > provide. I'd be really interested if someone on the LLD team knew the 
> > > answer to that.
> > > relies on the linker defining __start / __stop variables, which I don't 
> > > know if linkers on Windows / MacOS provide. I'd be really interested if 
> > > someone on the LLD team knew the answer to that.
> > 
> > @MaskRay, @rnk - would you happen to know the answer?
> I believe MachO has an equivalent mechanism, but I'm not familiar with it. 
> For PE/COFF, you can search the ASan code to see how the __start / __stop 
> symbols are defined on Windows using various pragmas and 
> `__declspec(allocate)` to set up sections and sort them accordingly.
> 
> I would love to have a doc that writes up how to implement this array 
> registration mechanism portably for all major platforms, given that we 
> believe it is possible everywhere.
> I believe MachO has an equivalent mechanism, but I'm not familiar with it. 
> For PE/COFF, you can search the ASan code to see how the start / stop symbols 
> are defined on Windows using various pragmas and __declspec(allocate) to set 
> up sections and sort them accordingly.
> I would love to have a doc that writes up how to implement this array 
> registration mechanism portably for all major platforms, given that we 
> believe it is possible everywhere.
Thanks for the information, I was having a hard to figuring out if it was 
possible to implement this on other platforms. Some documentation for handling 
this on each platform would definitely be useful as I am hoping this can become 
the standard way to compile / register offloading languages in LLVM. Let me 
know if I can do anything to help on that front.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120272/new/

https://reviews.llvm.org/D120272

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

Reply via email to