tra added inline comments.
================ Comment at: clang/test/Driver/embed-bitcode-nvptx.cu:1 +// RUN: %clang -Xclang -triple -Xclang nvptx64 -S -Xclang -target-feature -Xclang +ptx70 -fembed-bitcode=all --cuda-device-only -nocudalib -nocudainc %s -o - | FileCheck %s +// REQUIRES: nvptx-registered-target ---------------- jdoerfert wrote: > tra wrote: > > jdoerfert wrote: > > > tra wrote: > > > > This command line looks extremely odd to me. > > > > If you are compiling with `--cuda-device-only`, then clang should've > > > > already set the right triple and the features. > > > > > > > > Could you tell me more about what is the intent of the compilation and > > > > why you use this particular set of options? > > > > I.e. why not just do `clang -x cuda --offload-arch=sm_70 > > > > --cuda-device-only -nocudalib -nocudainc`. > > > > > > > > Could you tell me more about what is the intent of the compilation and > > > > why you use this particular set of options? > > > > > > because I never compiled cuda really ;) > > > > > > I'll go with your options. > > Something still does not add up. > > > > AFAICT, the real problem is that that we're not adding `-target-cpu`, but > > rather that `-fembed-bitcode=all` splits `-S` compilation into two phases > > -- source-to-bitcode (this part gets all the right command line options and > > compiles fine) and `IR -> PTX` compilation which does end up only with the > > subset of the options and ends up failing because the intrinsics are not > > enabled. > > > > I think what we want to do in this case is to prevent splitting GPU-side > > compilation. Adding a '-target-gpu' to the `IR->PTX` subcompilation may > > make things work in this case, but it does not really fix the root cause. > > E.g. we should also pass through the features set by the driver and, > > possibly, other options to keep both source->IR and IR->PTX compilations in > > sync. > > > > I think what we want to do in this case is to prevent splitting GPU-side > > compilation. > > I doubt that is as easy as it sounds. Where do we take the IR from then? (I > want the GPU IR embedded after all) > > > E.g. we should also pass through the features set by the driver and .. > > I agree, what if I move the embedding handling to the end, keep the > "blacklist" that removes arguments we don't want, and see where that leads us? Ah, so you do grab the intermediate IR. I assume that the PTX does get used, too. Another way to deal with this may be to do two independent compilations -- source-to-IR and source-to-PTX. Each would use the standard compilation flags. The downside is that parsing and optimization time will double, so split compilation combined with filtering args is probably more practical. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D100609/new/ https://reviews.llvm.org/D100609 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits