jdoerfert added inline comments.
================ Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1 +// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature -Xclang +ptx70 -fembed-bitcode=all --cuda-device-only -nocudalib -nocudainc %s -o - | FileCheck %s +// REQUIRES: nvptx-registered-target ---------------- tra wrote: > jdoerfert wrote: > > tra wrote: > > > This command line looks extremely odd to me. > > > If you are compiling with `--cuda-device-only`, then clang should've > > > already set the right triple and the features. > > > > > > Could you tell me more about what is the intent of the compilation and > > > why you use this particular set of options? > > > I.e. why not just do `clang -x cuda --offload-arch=sm_70 > > > --cuda-device-only -nocudalib -nocudainc`. > > > > > > Could you tell me more about what is the intent of the compilation and > > > why you use this particular set of options? > > > > because I never compiled cuda really ;) > > > > I'll go with your options. > Something still does not add up. > > AFAICT, the real problem is that that we're not adding `-target-cpu`, but > rather that `-fembed-bitcode=all` splits `-S` compilation into two phases -- > source-to-bitcode (this part gets all the right command line options and > compiles fine) and `IR -> PTX` compilation which does end up only with the > subset of the options and ends up failing because the intrinsics are not > enabled. > > I think what we want to do in this case is to prevent splitting GPU-side > compilation. Adding a '-target-gpu' to the `IR->PTX` subcompilation may make > things work in this case, but it does not really fix the root cause. E.g. we > should also pass through the features set by the driver and, possibly, other > options to keep both source->IR and IR->PTX compilations in sync. > > I think what we want to do in this case is to prevent splitting GPU-side > compilation. I doubt that is as easy as it sounds. Where do we take the IR from then? (I want the GPU IR embedded after all) > E.g. we should also pass through the features set by the driver and .. I agree, what if I move the embedding handling to the end, keep the "blacklist" that removes arguments we don't want, and see where that leads us? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100609/new/ https://reviews.llvm.org/D100609 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits