[PATCH] D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload

Johannes Doerfert via Phabricator via cfe-commits Fri, 16 Apr 2021 14:13:13 -0700

jdoerfert added inline comments.


================
Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1
+// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature 
-Xclang +ptx70 -fembed-bitcode=all --cuda-device-only -nocudalib -nocudainc %s 
-o - | FileCheck %s
+// REQUIRES: nvptx-registered-target
----------------
tra wrote:
> jdoerfert wrote:
> > tra wrote:
> > > This command line looks extremely odd to me.
> > > If you are compiling with `--cuda-device-only`, then clang should've 
> > > already set the right triple and the features.
> > > 
> > > Could you tell me more about what is the intent of the compilation and 
> > > why you use this particular set of options?
> > > I.e. why not just do `clang -x cuda --offload-arch=sm_70 
> > > --cuda-device-only -nocudalib -nocudainc`.
> > > 
> > > Could you tell me more about what is the intent of the compilation and 
> > > why you use this particular set of options?
> > 
> > because I never compiled cuda really ;)
> > 
> > I'll go with your options.
> Something still does not add up. 
> 
> AFAICT, the real problem is that that we're not adding `-target-cpu`, but 
> rather that `-fembed-bitcode=all` splits `-S` compilation into two phases -- 
> source-to-bitcode (this part gets all the right command line options and 
> compiles fine) and `IR -> PTX` compilation which does end up only with the 
> subset of the options and ends up failing because the intrinsics are not 
> enabled.
> 
> I think what we want to do in this case is to prevent splitting GPU-side 
> compilation. Adding a '-target-gpu' to the `IR->PTX` subcompilation may make 
> things work in this case, but it does not really fix the root cause. E.g. we 
> should also pass through the features set by the driver and, possibly, other 
> options to keep both source->IR and IR->PTX compilations in sync.
> 
> I think what we want to do in this case is to prevent splitting GPU-side 
> compilation.

I doubt that is as easy as it sounds. Where do we take the IR from then? (I 
want the GPU IR embedded after all)

> E.g. we should also pass through the features set by the driver and ..

I agree, what if I move the embedding handling to the end, keep the "blacklist" 
that removes arguments we don't want, and see where that leads us?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100609/new/

https://reviews.llvm.org/D100609

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload

Reply via email to