tra added a comment.

In https://reviews.llvm.org/D47394#1118223, @gtbercea wrote:

> I tried this example 
> (https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/). 
> It worked with NVCC but not with clang++. I can produce the main.o particle.o 
> and v.o objects as relocatable (-fcuda-rdc) but the final step fails with a 
> missing reference error.


It's not clear what exactly you mean by the "final step" and what exactly was 
the error. Could you give me more details?

> This leads me to believe that embedding the CUDA fatbin code in the host 
> object comes with limitations. If I were to change the OpenMP NVPTX toolchain 
> to do the same then I would run into similar problems.

It's a two-part problem.

In the end, we need to place GPU-side binary (whether it's an object or an 
executable) in a way that CUDA tools can recognize. You should end up with 
pretty much the same set of bits. If clang currently does not do that well 
enough, we should fix it.

Second part is what do we do about GPU-side object files. NVCC has some 
under-the-hood magic that invokes nvlink. If we invoke clang for the final 
linking phase, it has no idea that some of .o files may have GPU code in it 
that may need extra steps before we can pass everything to the linker to 
produce the host executable. IMO the linking of GPU-side objects should be done 
outside of clang. I.e. one could do it with an extra build rule which would 
invoke `nvcc --device-link  ...` to link all GPU-side objects into a GPU 
executable, still wrapped in a host .o, which can then be linked into the host 
executable.

> On the other hand., the example, ported to use OpenMP declare target regions 
> (instead of __device__) it all compiles, links and runs correctly.
> 
> In general, I feel that if we go the way you propose then the solution is 
> truly confined to NVPTX. If we instead implement a scheme like the one in 
> this patch then we give other toolchains a chance to perhaps fill the nvlink 
> "gap" and eventually be able to handle offloading in a similar manner and 
> support static linking.

I'm not sure how is "fatbin + clang -fcuda-gpubinary" is any more confining to 
NVPTX than "fatbin + clang + ld -r" -- either way you rely on nvidia-specific 
tool. If at some point you find it too confining, changing either of those will 
require pretty much the same amount of work.


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to