tra added inline comments.
================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:188 + static inline OffloadKind getEmptyKey() { + return static_cast<OffloadKind>(0xFFFF); + } ---------------- Extend `enum OffloadKind` to include these special kinds. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:201 static DeviceFile getEmptyKey() { - return {DenseMapInfo<StringRef>::getEmptyKey(), + return {static_cast<OffloadKind>(DenseMapInfo<uint16_t>::getEmptyKey()), DenseMapInfo<StringRef>::getEmptyKey(), ---------------- Why do we limit ourselves to uint16_t here? Can't we just use `OffloadKind` itself and get rid of these casts? ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:726 + + if (Error Err = executeCommands(*FatBinaryPath, CmdArgs)) + return std::move(Err); ---------------- We should have a way to pass extra options to fatbinary, too. E.g. we may want to use `--compress-all`. Also, we may need to pass through `-g` for debug builds. Oh. Debug builds. Makes me wonder if cuda-gdb will be able to find GPU binaries packaged by the new driver. If it does not, it will be a rather serious problem. It would likely affect various profiling tools the same way, too. Can you give it a try? ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1192 // Run LTO on any bitcode files and replace the input with the result. - if (Error Err = linkBitcodeFiles(LinkerInput.getSecond(), TheTriple, - File.Arch, WholeProgram)) + if (Error Err = linkBitcodeFiles(LinkerInputFiles, TheTriple, File.Arch, + WholeProgram)) ---------------- Nit. We should think of changing `linkBitcodeFiles` to return `llvm::ErrorOr<bool>` so we can return `WholeArchive` value, instead of modifying it as an argument. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1200 + LinkedImages.emplace_back(OFK_OpenMP, TheTriple.getTriple(), File.Arch, + LinkerInputFiles.front()); continue; ---------------- What's expected to happen if we have more than one input? If only one is ever expected, I'd add an assert. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1206 // CUDA in non-RDC mode. if (WholeProgram && TheTriple.isNVPTX()) { + assert(!LinkerInputFiles.empty() && "No non-RDC image to embed"); ---------------- Should `EmbedBitcode` be mutually exclusive vs `WholeArchive`? Can we ever end up with both unset? ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1210 + LinkedImages.emplace_back(Kind, TheTriple.getTriple(), File.Arch, + LinkerInputFiles.front()); continue; ---------------- ditto about the assert on the number of inputs. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1347 + + auto FileOrErr = compileModule(M); + if (!FileOrErr) ---------------- The `M` contains generated registration glue at this point, right? It may be worth a comment to explain what is it that we're compiling here. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:271 +llvm::Error wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images) { + return Error::success(); +} ---------------- A "not-implemented-yet/TODO" comment would be appropriate here. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.h:20 + +/// Wrap the input fatbinary image into the module \p M as global symbols and +/// registers the images with the CUDA runtime. ---------------- It should be either "Wraps/registers" or "Wrap/register". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D123810/new/ https://reviews.llvm.org/D123810 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits