Hahnfeld added a comment.

In https://reviews.llvm.org/D47394#1123044, @tra wrote:

> While I'm not completely convinced that [fatbin]->.c->[clang]->.o (with 
> device code only)->[ld -r] -> host.o (host+device code) is ideal (things 
> could be done with smaller number of tool invocations), it should help to 
> deal with -rdc compilation until we get a chance to improve support for it in 
> Clang. We may revisit and change this portion of the pipeline when clang can 
> incorporate -rdc GPU binaries in a way compatible with CUDA tools.


I think this should work with current trunk, Clang puts the GPU binary into a 
section called `__nv_relfatbin` when also passing `-fcuda-rdc` (see 
https://reviews.llvm.org/D42922).
What will probably result in problems are the registration functions as shown 
above by @gtbercea (`undefined references`...). But as we don't need them for 
OpenMP (we have our own registration machinery) it might be worth implementing 
something like `-fno-cuda-registration`. Maybe then `clang -cc1 <host> 
-fcuda-include-gpubinary <device> -fcuda-rdc -fno-cuda-registration` can be 
used to embed the device object, replacing the dance ending in `ld -r`?


Repository:
  rC Clang

https://reviews.llvm.org/D47394



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to