chandlerc added a comment.

One high level point that is at least worth clarifying, and maybe others will 
want to suggest a different approach:

The overall approach here is to have as small of a difference between the O1 
<https://reviews.llvm.org/owners/package/1/> and O2 
<https://reviews.llvm.org/owners/package/2/> pipelines as possible.

An alternative approach that we could take would be to design a focused O1 
<https://reviews.llvm.org/owners/package/1/> pipeline without regard to how 
much it diverges from the O2 <https://reviews.llvm.org/owners/package/2/> 
pipeline.

Which approach is used somewhat depends on the goals. I feel like the goal here 
is to get as close to the level of optimization at O2 
<https://reviews.llvm.org/owners/package/2/> as possible without losing compile 
time or coherent backtraces for test / assertion failures. For that goal, the 
approach taken makes sense. But it seems important to clarify that goal as 
otherwise I think we'd want to go in very different directions.

In D65410#1613555 <https://reviews.llvm.org/D65410#1613555>, @hfinkel wrote:

> Thanks for starting on this. Can you go ahead and replace the sroa calls with 
> mem2reg calls for `O1` and then see what that does to the performance? That 
> strikes me as a major change, but certainly one that potentially makes sense, 
> so I'd rather we go ahead and test it now before we make decisions about 
> other adjustments.


I really think we need mem2reg at least at -O1... In fact, I really think we 
need SROA at O1 <https://reviews.llvm.org/owners/package/1/>. If it is actually 
a compile time problem, I'd like to fix that in SROA. I don't really expect it 
to be though.

> FWIW, I thought that we might run InstCombine less often (or maybe replace it 
> with InstSimplify, in some places). Did you try that?

I think the biggest thing to do would be to avoid repeated runs of instcombine 
over the same code. I suspect we want at least one run after inliner and inside 
the CGSCC walk for canonicalization. But it'd be great to limit it to exactly 
one or maaaaybe one before and one after the loop pipeline.



================
Comment at: llvm/lib/Passes/PassBuilder.cpp:407-414
   }
 
   // Speculative execution if the target has divergent branches; otherwise nop.
-  FPM.addPass(SpeculativeExecutionPass());
+  if (Level > O1)
+    FPM.addPass(SpeculativeExecutionPass());
 
   // Optimize based on known information about branches, and cleanup afterward.
----------------
I think you can merge all of these?


================
Comment at: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:362
 
+  // TODO: Investigate the cost/benefit of tail call elimination on debugging.
   MPM.add(createTailCallEliminationPass()); // Eliminate tail calls
----------------
hfinkel wrote:
> By definition, this loses information from the call stack, no?
Yeah, I'd have really expected this to be skipped.


================
Comment at: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:432
 
+  // TODO: Investigate if this is too expensive at O1.
   MPM.add(createAggressiveDCEPass());         // Delete dead instructions
----------------
hfinkel wrote:
> Yes, I'd fall back to using regular DCE.
+1


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65410/new/

https://reviews.llvm.org/D65410



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to