jansvoboda11 added inline comments.
================ Comment at: clang/lib/Frontend/CompilerInvocation.cpp:677 + SwapOpts(Res); + bool Success2 = Parse(Res, GeneratedArgs1, Diags); + ---------------- saudi wrote: > jansvoboda11 wrote: > > saudi wrote: > > > Hello, > > > > > > I encountered crashes on Windows targets, related to this line, when > > > rebasing https://reviews.llvm.org/D80833, where > > > `CodeGenOpts::CommandLineArgs` is read later in the compilation process. > > > > > > When the function exits, `GeneratedArgs1` is destroyed but > > > `Res.getCodeGenOpts().CommandLineArgs` still references its contents. The > > > code has changed since this patch was submitted, but the logic remains > > > the same in more recent code. > > > > > > Note that the bug doesn't happen when round-trip is skipped, as > > > `CommandLineArgs` then refers to `ArgV` which content is valid for the > > > entire compiler run. > > > > > > As a solution I considered allowing `CodeGenOptions` to optionally own > > > the memory by introducing `BumpPtrAllocator > > > CodeGenOptions::CommandLineArgsAlloc`. > > > However it has drawbacks: > > > - need to customize `CodeGenOptions` copy constructor/operator, > > > - lifetime for copies of `CodeGenOptions::CommandLineArgs` (e.g. to > > > other structures) would be bound to the lifetime of the original > > > `CodeGenOptions` object. > > > > > > Another solution (slower) would be to use 2 dummy `CompilerInvocation` to > > > perform the round-trip test, and use the regular arguments on the main > > > instance. It would change the behavior since the main instance currently > > > uses `GeneratedArgs1`. > > > > > > WDYT? > > > Hello, > > > > > > I encountered crashes on Windows targets, related to this line, when > > > rebasing https://reviews.llvm.org/D80833, where > > > `CodeGenOpts::CommandLineArgs` is read later in the compilation process. > > > > > > When the function exits, `GeneratedArgs1` is destroyed but > > > `Res.getCodeGenOpts().CommandLineArgs` still references its contents. The > > > code has changed since this patch was submitted, but the logic remains > > > the same in more recent code. > > > > > > Note that the bug doesn't happen when round-trip is skipped, as > > > `CommandLineArgs` then refers to `ArgV` which content is valid for the > > > entire compiler run. > > > > Hi, you're right, I can see the lifetime issues here. > > > > > As a solution I considered allowing `CodeGenOptions` to optionally own > > > the memory by introducing `BumpPtrAllocator > > > CodeGenOptions::CommandLineArgsAlloc`. > > > However it has drawbacks: > > > - need to customize `CodeGenOptions` copy constructor/operator, > > > - lifetime for copies of `CodeGenOptions::CommandLineArgs` (e.g. to > > > other structures) would be bound to the lifetime of the original > > > `CodeGenOptions` object. > > > > Instead of introducing `BumpPtrAllocator` in `CodeGenOptions` and dealing > > with custom copy constructor/assignment, I think it should be possible to > > use an owning type (e.g. `std::vector<std::string>>`), WDYT? That's the > > approach we generally take in `CompilerInvocation` members. > > > > > Another solution (slower) would be to use 2 dummy `CompilerInvocation` to > > > perform the round-trip test, and use the regular arguments on the main > > > instance. It would change the behavior since the main instance currently > > > uses `GeneratedArgs1`. > > > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since > > they are guaranteed to be semantically equivalent anyways. > > > > > WDYT? > > > > I think both approaches are valid, but I think the cleanest solution would > > be to drop `CodeGenOptions::CommandLineArgs` entirely. It's one of the few > > cases where `CompilerInvocation` points to memory owned by someone else. (I > > thought this only happens in `PreprocessorOptions::RemappedFileBuffers`, > > but I stand corrected.) Whenever clients need the original command-line, > > they can call `CompilerInvocation::generateCC1CommandLine` and get > > semantically equivalent list of cc1 arguments. Would that work for you > > use-case? > Hello, thank you for your fast feedback! > > > Instead of introducing BumpPtrAllocator in CodeGenOptions and dealing with > > custom copy constructor/assignment, I think it should be possible to use an > > owning type (e.g. std::vector<std::string>>), WDYT? That's the approach we > > generally take in CompilerInvocation members. > Actually, `CodeGenOptions::CommandLineArgs` is an `ArrayRef<const char *>`, > so we need to store an array (e.g. `std::vector<const char *>`) which entries > point inside the owning container. Maintaining this relationship will > probably require custom copy constructor/assigment work anyway. > > Another solution could be to store the generated command line arguments > (`BumpPtrAllocator` + `SmallVector<const char *>`) into the > `CompilerInvocation` object. Especially, `CompilerInvocationRefBase` already > has custom copy mechanism. > > Note on `BumpPtrAllocator` vs owner vector: containers like `std::string` > may be optimized when the content is small, embedding the value where the > allocated pointer would be. That could break the `string::c_str()` results > when `std::vector` grows. `SmallString`, `SmallVector` etc use the same > pattern. > > > Whenever clients need the original command-line, they can call > > `CompilerInvocation::generateCC1CommandLine` and get semantically > > equivalent list of cc1 arguments. Would that work for you use-case? > Unfortunately, the patch I mentionned accesses `CommandLineArguments` at > CodeGen time (in llvm code, not clang), so the `CompilerInvocation` is not > available. Currently, this is forwarded to CodeGen by storing a reference to > `CommandLineArguments` in `MCTargetOptions`. > > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since > > they are guaranteed to be semantically equivalent anyways. > > In the code, I see that the round-trip had two effects: > - Verify that the parse+generate is symetrical and complete (not losing > arguments on the way), emitting an error if not > - Use the arguments computed during the round-trip during the rest of > compilation > > Using `ArgV` instead of `GeneratedArgs1` for the main `CompilerInvocation` > would simplify the round-trip code a bit, however it would remove the latter > effect. Is it acceptable? > > > Instead of introducing BumpPtrAllocator in CodeGenOptions and dealing with > > custom copy constructor/assignment, I think it should be possible to use an > > owning type (e.g. std::vector<std::string>>), WDYT? That's the approach we > > generally take in CompilerInvocation members. > Actually, `CodeGenOptions::CommandLineArgs` is an `ArrayRef<const char *>`, > so we need to store an array (e.g. `std::vector<const char *>`) which entries > point inside the owning container. Maintaining this relationship will > probably require custom copy constructor/assigment work anyway. > > Another solution could be to store the generated command line arguments > (`BumpPtrAllocator` + `SmallVector<const char *>`) into the > `CompilerInvocation` object. Especially, `CompilerInvocationRefBase` already > has custom copy mechanism. > > Note on `BumpPtrAllocator` vs owner vector: containers like `std::string` > may be optimized when the content is small, embedding the value where the > allocated pointer would be. That could break the `string::c_str()` results > when `std::vector` grows. `SmallString`, `SmallVector` etc use the same > pattern. > > > Whenever clients need the original command-line, they can call > > `CompilerInvocation::generateCC1CommandLine` and get semantically > > equivalent list of cc1 arguments. Would that work for you use-case? > Unfortunately, the patch I mentionned accesses `CommandLineArguments` at > CodeGen time (in llvm code, not clang), so the `CompilerInvocation` is not > available. Currently, this is forwarded to CodeGen by storing a reference to > `CommandLineArguments` in `MCTargetOptions`. Let me clarify. AFAIK the only read from `clang::CodeGenOptions::CommandLineArgs` appears in `initTargetOptions`, which is **Clang** code that initializes `llvm::{MC,}TargetOptions`. My thinking is that this code could access `CompilerInvocation`, call its `generateCC1CommandLine` and put the generated arguments into an owning `std::vector<std::string>` inside `llvm::MCTargetOptions`. The `CompilerInvocation` data structure could then drop `CodeGenOptions::CommandLineArgs` since it's redundant anyways (it can be re-generated). This way, we can avoid complicating the lifetimes and clean up `CompilerInvocation`. If we really want, we can optimize the `std::string`s away by using `BumpPtrAllocator` - as you mentioned - inside `llvm::{MC,}TargetOptions`. Do you think that makes sense? > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since > > they are guaranteed to be semantically equivalent anyways. > > In the code, I see that the round-trip had two effects: > - Verify that the parse+generate is symetrical and complete (not losing > arguments on the way), emitting an error if not > - Use the arguments computed during the round-trip during the rest of > compilation > > Using `ArgV` instead of `GeneratedArgs1` for the main `CompilerInvocation` > would simplify the round-trip code a bit, however it would remove the latter > effect. Is it acceptable? From Clang's point of view, I don't think using `ArgV` instead of `GeneratedArgs1` would remove the effect of using the round-tripped arguments during the rest of the compilation. The semantics of the generated arguments are captured by the whole `CompilerInvocation` and `clang::CodeGenOptions::CommandLineArgs` don't get used anywhere else in Clang. The question is what LLVM does with these arguments. If it only shows them to the user to provide extra information, I think that's fine too. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D94472/new/ https://reviews.llvm.org/D94472 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits