saudi added inline comments.

================
Comment at: clang/lib/Frontend/CompilerInvocation.cpp:677
+  SwapOpts(Res);
+  bool Success2 = Parse(Res, GeneratedArgs1, Diags);
+
----------------
jansvoboda11 wrote:
> saudi wrote:
> > jansvoboda11 wrote:
> > > saudi wrote:
> > > > Hello,
> > > > 
> > > > I encountered crashes on Windows targets, related to this line, when 
> > > > rebasing https://reviews.llvm.org/D80833, where 
> > > > `CodeGenOpts::CommandLineArgs` is read later in the compilation process.
> > > > 
> > > > When the function exits, `GeneratedArgs1` is destroyed but 
> > > > `Res.getCodeGenOpts().CommandLineArgs` still references its contents. 
> > > > The code has changed since this patch was submitted, but the logic 
> > > > remains the same in more recent code.
> > > > 
> > > > Note that the bug doesn't happen when round-trip is skipped, as 
> > > > `CommandLineArgs` then refers to `ArgV` which content is valid for the 
> > > > entire compiler run.
> > > > 
> > > > As a solution I considered allowing `CodeGenOptions` to optionally own 
> > > > the memory by introducing `BumpPtrAllocator 
> > > > CodeGenOptions::CommandLineArgsAlloc`.
> > > >  However it has drawbacks:
> > > >  - need to customize `CodeGenOptions` copy constructor/operator,
> > > >  - lifetime for copies of `CodeGenOptions::CommandLineArgs` (e.g. to 
> > > > other structures) would be bound to the lifetime of the original 
> > > > `CodeGenOptions` object.
> > > >  
> > > > Another solution (slower) would be to use 2 dummy `CompilerInvocation` 
> > > > to perform the round-trip test, and use the regular arguments on the 
> > > > main instance. It would change the behavior since the main instance 
> > > > currently uses `GeneratedArgs1`.
> > > > 
> > > > WDYT?
> > > > Hello,
> > > > 
> > > > I encountered crashes on Windows targets, related to this line, when 
> > > > rebasing https://reviews.llvm.org/D80833, where 
> > > > `CodeGenOpts::CommandLineArgs` is read later in the compilation process.
> > > > 
> > > > When the function exits, `GeneratedArgs1` is destroyed but 
> > > > `Res.getCodeGenOpts().CommandLineArgs` still references its contents. 
> > > > The code has changed since this patch was submitted, but the logic 
> > > > remains the same in more recent code.
> > > > 
> > > > Note that the bug doesn't happen when round-trip is skipped, as 
> > > > `CommandLineArgs` then refers to `ArgV` which content is valid for the 
> > > > entire compiler run.
> > > 
> > > Hi, you're right, I can see the lifetime issues here.
> > > 
> > > > As a solution I considered allowing `CodeGenOptions` to optionally own 
> > > > the memory by introducing `BumpPtrAllocator 
> > > > CodeGenOptions::CommandLineArgsAlloc`.
> > > >  However it has drawbacks:
> > > >  - need to customize `CodeGenOptions` copy constructor/operator,
> > > >  - lifetime for copies of `CodeGenOptions::CommandLineArgs` (e.g. to 
> > > > other structures) would be bound to the lifetime of the original 
> > > > `CodeGenOptions` object.
> > > 
> > > Instead of introducing `BumpPtrAllocator` in `CodeGenOptions` and dealing 
> > > with custom copy constructor/assignment, I think it should be possible to 
> > > use an owning type (e.g. `std::vector<std::string>>`), WDYT? That's the 
> > > approach we generally take in `CompilerInvocation` members.
> > > 
> > > > Another solution (slower) would be to use 2 dummy `CompilerInvocation` 
> > > > to perform the round-trip test, and use the regular arguments on the 
> > > > main instance. It would change the behavior since the main instance 
> > > > currently uses `GeneratedArgs1`.
> > > 
> > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since 
> > > they are guaranteed to be semantically equivalent anyways.
> > > 
> > > > WDYT?
> > > 
> > > I think both approaches are valid, but I think the cleanest solution 
> > > would be to drop `CodeGenOptions::CommandLineArgs` entirely. It's one of 
> > > the few cases where `CompilerInvocation` points to memory owned by 
> > > someone else. (I thought this only happens in 
> > > `PreprocessorOptions::RemappedFileBuffers`, but I stand corrected.) 
> > > Whenever clients need the original command-line, they can call 
> > > `CompilerInvocation::generateCC1CommandLine` and get semantically 
> > > equivalent list of cc1 arguments. Would that work for you use-case?
> > Hello, thank you for your fast feedback!
> > 
> > > Instead of introducing BumpPtrAllocator in CodeGenOptions and dealing 
> > > with custom copy constructor/assignment, I think it should be possible to 
> > > use an owning type (e.g. std::vector<std::string>>), WDYT? That's the 
> > > approach we generally take in CompilerInvocation members.
> > Actually, `CodeGenOptions::CommandLineArgs` is an `ArrayRef<const char *>`, 
> > so we need to store an array (e.g. `std::vector<const char *>`) which 
> > entries point inside the owning container. Maintaining this relationship 
> > will probably require custom copy constructor/assigment work anyway.
> > 
> > Another solution could be to store the generated command line arguments 
> > (`BumpPtrAllocator` + `SmallVector<const char *>`) into the 
> > `CompilerInvocation` object. Especially, `CompilerInvocationRefBase` 
> > already has custom copy mechanism.
> > 
> > Note on `BumpPtrAllocator` vs owner vector:  containers like `std::string` 
> > may be optimized when the content is small, embedding the value where the 
> > allocated pointer would be. That could break the `string::c_str()` results 
> > when `std::vector` grows. `SmallString`, `SmallVector` etc use the same 
> > pattern.
> > 
> > > Whenever clients need the original command-line, they can call 
> > > `CompilerInvocation::generateCC1CommandLine` and get semantically 
> > > equivalent list of cc1 arguments. Would that work for you use-case?
> > Unfortunately, the patch I mentionned accesses `CommandLineArguments` at 
> > CodeGen time (in llvm code, not clang), so the `CompilerInvocation` is not 
> > available. Currently, this is forwarded to CodeGen by storing a reference 
> > to `CommandLineArguments` in `MCTargetOptions`.
> > 
> > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since 
> > > they are guaranteed to be semantically equivalent anyways.
> > 
> > In the code, I see that the round-trip had two effects:
> >  - Verify that the parse+generate is symetrical and complete (not losing 
> > arguments on the way), emitting an error if not
> >  - Use the arguments computed during the round-trip during the rest of 
> > compilation
> > 
> > Using `ArgV` instead of `GeneratedArgs1` for the main `CompilerInvocation` 
> > would simplify the round-trip code a bit, however it would remove the 
> > latter effect. Is it acceptable?
> > 
> > > Instead of introducing BumpPtrAllocator in CodeGenOptions and dealing 
> > > with custom copy constructor/assignment, I think it should be possible to 
> > > use an owning type (e.g. std::vector<std::string>>), WDYT? That's the 
> > > approach we generally take in CompilerInvocation members.
> > Actually, `CodeGenOptions::CommandLineArgs` is an `ArrayRef<const char *>`, 
> > so we need to store an array (e.g. `std::vector<const char *>`) which 
> > entries point inside the owning container. Maintaining this relationship 
> > will probably require custom copy constructor/assigment work anyway.
> > 
> > Another solution could be to store the generated command line arguments 
> > (`BumpPtrAllocator` + `SmallVector<const char *>`) into the 
> > `CompilerInvocation` object. Especially, `CompilerInvocationRefBase` 
> > already has custom copy mechanism.
> > 
> > Note on `BumpPtrAllocator` vs owner vector:  containers like `std::string` 
> > may be optimized when the content is small, embedding the value where the 
> > allocated pointer would be. That could break the `string::c_str()` results 
> > when `std::vector` grows. `SmallString`, `SmallVector` etc use the same 
> > pattern.
> > 
> > > Whenever clients need the original command-line, they can call 
> > > `CompilerInvocation::generateCC1CommandLine` and get semantically 
> > > equivalent list of cc1 arguments. Would that work for you use-case?
> > Unfortunately, the patch I mentionned accesses `CommandLineArguments` at 
> > CodeGen time (in llvm code, not clang), so the `CompilerInvocation` is not 
> > available. Currently, this is forwarded to CodeGen by storing a reference 
> > to `CommandLineArguments` in `MCTargetOptions`.
> 
> Let me clarify. AFAIK the only read from 
> `clang::CodeGenOptions::CommandLineArgs` appears in `initTargetOptions`, 
> which is **Clang** code that initializes `llvm::{MC,}TargetOptions`. My 
> thinking is that this code could access `CompilerInvocation`, call its 
> `generateCC1CommandLine` and put the generated arguments into an owning 
> `std::vector<std::string>` inside `llvm::MCTargetOptions`. The 
> `CompilerInvocation` data structure could then drop 
> `CodeGenOptions::CommandLineArgs` since it's redundant anyways (it can be 
> re-generated). This way, we can avoid complicating the lifetimes and clean up 
> `CompilerInvocation`. If we really want, we can optimize the `std::string`s 
> away by using `BumpPtrAllocator` - as you mentioned - inside 
> `llvm::{MC,}TargetOptions`. Do you think that makes sense?
> 
> > > Referring to `ArgV` instead of `GeneratedArgs1` should be fine too, since 
> > > they are guaranteed to be semantically equivalent anyways.
> > 
> > In the code, I see that the round-trip had two effects:
> >  - Verify that the parse+generate is symetrical and complete (not losing 
> > arguments on the way), emitting an error if not
> >  - Use the arguments computed during the round-trip during the rest of 
> > compilation
> > 
> > Using `ArgV` instead of `GeneratedArgs1` for the main `CompilerInvocation` 
> > would simplify the round-trip code a bit, however it would remove the 
> > latter effect. Is it acceptable?
> 
> From Clang's point of view, I don't think using `ArgV` instead of 
> `GeneratedArgs1` would remove the effect of using the round-tripped arguments 
> during the rest of the compilation. The semantics of the generated arguments 
> are captured by the whole `CompilerInvocation` and 
> `clang::CodeGenOptions::CommandLineArgs` don't get used anywhere else in 
> Clang. The question is what LLVM does with these arguments. If it only shows 
> them to the user to provide extra information, I think that's fine too.
> Let me clarify. AFAIK the only read from 
> `clang::CodeGenOptions::CommandLineArgs` appears in `initTargetOptions`, 
> which is **Clang** code that initializes `llvm::{MC,}TargetOptions`. My 
> thinking is that this code could access `CompilerInvocation`, call its 
> `generateCC1CommandLine` and put the generated arguments into an owning 
> `std::vector<std::string>` inside `llvm::MCTargetOptions`. 
After looking a bit, `initTargetOptions` doesn't seem to have access to the 
`CompilerInvocation`, it is triggered by the clang backend, where only the 
`{HeaderSearch|CodeGen|Target|Lang}Options` structures are passed; it would 
require passing the `CompilerInvocation` through several layers.

> From Clang's point of view, I don't think using `ArgV` instead of 
> `GeneratedArgs1` would remove the effect of using the round-tripped arguments 
> during the rest of the compilation. The semantics of the generated arguments 
> are captured by the whole `CompilerInvocation` and 
> `clang::CodeGenOptions::CommandLineArgs` don't get used anywhere else in 
> Clang. The question is what LLVM does with these arguments. If it only shows 
> them to the user to provide extra information, I think that's fine too.

I realized that using `ArgV` instead of `GeneratedArgs1` may miss detecting if 
`Generate` misses some arguments. In that case, no error would be reported by 
the round-trip test, as `GeneratedArgs1` and `GeneratedArgs2` would be 
identical. 

I'm currently trying to add a `BumpPtrAllocator` along with a `SmallVector< 
const char *>` in `CompilerInvocationRefBase`. I think it makes sense for 
CompilerInvocation to be the owner of the custom command line arguments 
generated during the round-trip.
Let me know if there would be a cleaner solution: maybe adding 
`CompilerInvocation` access in the backend would be simple enough?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94472/new/

https://reviews.llvm.org/D94472

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to