gtbercea added a comment.

> Assuming we do proceed with back-to-CUDA approach, one thing I'd consider 
> would be using clang's -fcuda-include-gpubinary option which CUDA uses to 
> include GPU code into the host object. You may be able to use it to avoid 
> compiling and partially linking .fatbin and host .o.

I tried this example 
(https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/). 
It worked with NVCC but not with clang++. I can produce the main.o particle.o 
and v.o objects as relocatable (-fcuda-rdc) but the final step fails with a 
missing reference error.
This leads me to believe that embedding the CUDA fatbin code in the host object 
comes with limitations. If I were to change the OpenMP NVPTX toolchain to do 
the same then I would run into similar problems.

On the other hand., the example, ported to use OpenMP declare target regions 
(instead of __device__) it all compiles, links and runs correctly.

In general, I feel that if we go the way you propose then the solution is truly 
confined to NVPTX. If we instead implement a scheme like the one in this patch 
then we give other toolchains a chance to perhaps fill the nvlink "gap" and 
eventually be able to handle offloading in a similar manner and support static 
linking.


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to