[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)

Kyungwoo Lee via cfe-commits Mon, 30 Sep 2024 04:29:57 -0700

kyulee-com wrote:

> * Looking at the NFC, this seems like it has very similar issues to 
> Propeller, which wants to redo just the codegen with a new injected profile 
> and BB ordering. It would be good to see if we can converge to similar 
> approaches. I asked @rlavaee to take a look and he is reading through the 
> background on this work. @rlavaee do you think Propeller could use a similar 
> approach to this where it saves the pre-codegen bitcode and re-loads it 
> instead of redoing opt? This isn't necessarily an action item for this PR, 
> but I wanted Rahman to take a look since he is more familiar with codegen.


It's interesting to know that Propeller wants to redo the codegen. I'm happy to 
align with this work. We've already started discussing this and have shared 
some details from our side. Here's the link for more info: 
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753/11?u=kyulee-com.

> * I think this should be doable to implement with distributed ThinLTO if we 
> want in the future, from what I can tell. But we'll need a way to save the 
> bitcode before codegen from a non-in-process backend (e.g. the thinBackend 
> invoked from clang). Not asking you to do anything for this PR on that, but 
> just thinking through it. Seems doable...

I think technically it's doable, but initially, I believed we did not want to 
repeat the code generation with distributed ThinLTO unless we were willing to 
introduce additional synchronization and spawn distributed ThinLTO backends 
again with the merged codegen data. If we aim to leverage the saved optimized 
IR, I suppose we need to assign the same backend work to the same machine used 
in the first run.
As commented above in the link, I wanted to confine repeating the codegen is 
for a single machine (In-process backend). I mainly wanted to separate the 
writer/reader builds for the use of distributed ThinLTO builds  to avoid this 
complication while allowing some stale codege data. However, it's certainly 
worth experimenting with.

> * My biggest concern about this patch as written is whether it will break 
> down under LTO's caching mechanism - have you tested it with caching? It 
> seems like it might just skip the backend completely since you aren't adding 
> anything to the cache key.

That's a good catch! Assuming this is not a common scenario (as mentioned in 
the link above RFC), my initial design intentionally disables LTO's caching for 
correctness and simplicity by setting an empty `FileCache()` in the constructor 
for both the first and second round backends. To enable proper caching, we need 
two additional caches/streams, in addition to one for the final object output: 
one for the scratch object files and another for the optimized IR files. I've 
managed to implement this with some refactoring, details of which I will follow.




https://github.com/llvm/llvm-project/pull/90933
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (PR #90933)

Reply via email to