tra added inline comments.

================
Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1
+// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature 
-Xclang +ptx70 -fembed-bitcode=all --cuda-device-only -nocudalib -nocudainc %s 
-o - | FileCheck %s
+// REQUIRES: nvptx-registered-target
----------------
jdoerfert wrote:
> tra wrote:
> > jdoerfert wrote:
> > > tra wrote:
> > > > This command line looks extremely odd to me.
> > > > If you are compiling with `--cuda-device-only`, then clang should've 
> > > > already set the right triple and the features.
> > > > 
> > > > Could you tell me more about what is the intent of the compilation and 
> > > > why you use this particular set of options?
> > > > I.e. why not just do `clang -x cuda --offload-arch=sm_70 
> > > > --cuda-device-only -nocudalib -nocudainc`.
> > > > 
> > > > Could you tell me more about what is the intent of the compilation and 
> > > > why you use this particular set of options?
> > > 
> > > because I never compiled cuda really ;)
> > > 
> > > I'll go with your options.
> > Something still does not add up. 
> > 
> > AFAICT, the real problem is that that we're not adding `-target-cpu`, but 
> > rather that `-fembed-bitcode=all` splits `-S` compilation into two phases 
> > -- source-to-bitcode (this part gets all the right command line options and 
> > compiles fine) and `IR -> PTX` compilation which does end up only with the 
> > subset of the options and ends up failing because the intrinsics are not 
> > enabled.
> > 
> > I think what we want to do in this case is to prevent splitting GPU-side 
> > compilation. Adding a '-target-gpu' to the `IR->PTX` subcompilation may 
> > make things work in this case, but it does not really fix the root cause. 
> > E.g. we should also pass through the features set by the driver and, 
> > possibly, other options to keep both source->IR and IR->PTX compilations in 
> > sync.
> > 
> > I think what we want to do in this case is to prevent splitting GPU-side 
> > compilation.
> 
> I doubt that is as easy as it sounds. Where do we take the IR from then? (I 
> want the GPU IR embedded after all)
> 
> > E.g. we should also pass through the features set by the driver and ..
> 
> I agree, what if I move the embedding handling to the end, keep the 
> "blacklist" that removes arguments we don't want, and see where that leads us?
Ah, so you do grab the intermediate IR. I assume that the PTX does get used, 
too. 

Another way to deal with this may be to do two independent compilations -- 
source-to-IR and source-to-PTX. Each would use the standard compilation flags. 
The downside is that parsing and optimization time will double, so split 
compilation combined with filtering args is probably more practical.





Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100609/new/

https://reviews.llvm.org/D100609

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to