jansvoboda11 added a comment.

Thanks for sharing your ideas! I've left my initial thoughts below, but I want 
to revisit this and think about it some more.

In D104465#2825343 <https://reviews.llvm.org/D104465#2825343>, @dexonsmith 
wrote:

> I'm not sure this is the right approach.
>
> - Don't PCHs sometimes validate based on hashes of inputs instead of size? At 
> least, that has been discussed, and this patch would block making the change.

The hash-based validation only kicks in when modification times don't match. 
The size of input files is always* validated 
<https://github.com/llvm/llvm-project/blob/cbfb124/clang/lib/Serialization/ASTReader.cpp#L2383-L2452>.

> - Can PCHs embed inputs, like modules can? How would that work (or not) 
> minimized sources generally?

I think PCHs can embed inputs the same way modules can. When reading an AST 
file, the embedded input files get registered in the current compilation along 
with their buffers via `SourceManager::createFileID`.
But it doesn't seem to be propagated to the `FileManager` level, which the 
input file size check uses. I might need to experiment with this a bit to fully 
understand how this works in practice.

> - What happens if the PCH has a delayed diagnostic, triggered when parsing 
> the next thing? Will the fix-it point in the wrong place? (Are there other 
> things that depend on reading the original sources when reading a PCH?)

Can you point me to the code that handles delayed diagnostics? I can't find 
anything related in `ASTWriter`.

> A few other ideas:
>
> 1. Disable PCH validation when minimizing. (Is that less robust than your 
> current workaround? If so, can you explain why?)

I think PCH validation (and detection of out-of-date input files) is still 
useful for the dependency scanner as it might help debugging build failures 
(e.g. if the input files changed between explicit PCH build and TU dependency 
scan).
Compared to disabling PCH validation, padding of minimized files is fairly 
local change that isn't susceptible to concurrency bugs and integrates with the 
rest of the implicit build infrastructure fairly well IMO.

> 2. Use the original PCH header in the scanning `-cc1`s (translate 
> `-include-pch` to `-include`) and switch back in the generated `-cc1`s (back 
> to `-include-pch`).

The two separate dependency scans (one for PCH, one for TU) could end up using 
different context hashes and we don't have an easy way to detect this.
If they both use a common module A, the explicitly-built PCH would contains 
version 1 of A, while the TU would attempt to build with version 2. This is 
currently not supported in explicit builds AFAIK.
The current implementation works around this by always choosing the module 
present in the explicitly-built module. How could we make this situation work 
here?

> 3. Embed instructions in the PCH for how to build it, and make a "minimized" 
> version of the PCH.

That could work. This makes me think if we could emit a minimized PCH during 
the PCH scan and just reuse it in the TU scan (similarly how we're currently 
reusing the explicitly-built PCH, but with the file size issues gone). Though 
passing state from one dependency scan to another through the filesystem might 
be unreliable.

In D104465#2825786 <https://reviews.llvm.org/D104465#2825786>, @dexonsmith 
wrote:

> Two more options to consider:
>
> 4. If a compilation uses a PCH, use the original files for any input file 
> transitively referenced by the PCH. Can be done by using a filesystem overlay 
> that skips over the minimization layer for those files, or could tell the 
> minimization layer not to minimize those files. You can figure out the files 
> by adding a free function that preloads the PCH and extracts out all the 
> input files.

I tend to prefer this option, since it integrates fairly well with the current 
way we build PCHs in XCBuild.
I have a WIP patch that implements this: D104536 
<https://reviews.llvm.org/D104536>. I plan to run some benchmarks to see the 
performance impact compared to minification + padding.

> 5. Add support for building/using a PCH, and only support PCHes that are 
> built this way. E.g., add a `-include-pch-auto` option or something. Scanner 
> would use `-include`, spit out a command for building a PCH using explicit 
> modules, and the generated `-cc1`s would use `-include-pch` to refer to it. 
> This way all the modules are generated "in house".

This would be the ideal solution IMO, but needs build system support.

----

BTW: I noticed the file size stored in AST also plays role when deciding 
whether to import or include header. `HeaderSearch::findModuleForHeader` calls 
`HeaderSearch::getExistingFileInfo` which pulls information (such as the file 
size) from `ExternalSource` (a.k.a. `ASTReader`).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104465/new/

https://reviews.llvm.org/D104465

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to