fhahn wrote:

I'd also like to share a bit more info about the expected compile-time impact 
of the change:
* stage1-O3 0.01%,  
* stage1-ReleaseThinLTO +0.04%,
* stage1-ReleaseLTO-g +0.07% ,
* stage2-O3 -0.07% 
* clang build time +0.13%

Full data: 
https://llvm-compile-time-tracker.com/compare.php?from=a62e1c8eddcda420abec57976dc48f97669277dc&to=570c7e85ef7f3479969f9f970a0086ff9c04036f&stat=instructions

TLDR: While there are some increase in compile-time, those are caused by a 
number of additional optimizations. The increase in clang build time yields a 
slightly faster stage2 clang while doing more optimizations.

Lets take a closer look at some of the increases in compile-time.

The statistics data below is for ARM64 macOS not X86 Linux like the 
compile-time tracker, but most impact is from IR optimizations, so I don’t 
expect the data to be vastly different.

For stage1-O3, the biggest increase is  Bullet +0.39%. This boils down to being 
able to apply more transformations, most notably probably +70% more loops being 
vectorized, which itself adds compile-time, as well as adding extra code to 
simplify. 

Top Statistic Changes for Bullet
  loop-simplifycfg.NumLoopBlocksDeleted 6.0 -> 26.0 +333.33%
  aarch64-ccmp.NumImmRangeRejs 66.0 -> 131.0 +98.48%
  loop-vectorize.LoopsVectorized 94.0 -> 160.0 +70.21%
  dse.NumRedundantStores 25.0 -> 39.0 +56.00%
  aarch64-ldst-opt.NumPostFolded 228.0 -> 355.0 +55.70%
  codegenprepare.NumPHIsElim 17.0 -> 25.0 +47.06%
  loop-simplifycfg.NumTerminatorsFolded 9.0 -> 13.0 +44.44%
  correlated-value-propagation.NumPhis 306.0 -> 423.0 +38.24%
  dagcombine.PostIndexedNodes 343.0 -> 473.0 +37.90%
  loop-idiom.NumMemSet 51.0 -> 66.0 +29.41%



For  stage1-ReleaseLTO-g the biggest increase is  kimwitu++ +0.7%. The input 
bitcode size after merging all modules before LTO is the same. Looking at the 
impact on the transformations, we optimize quite a bit more around memory 
instructions, so likely the main contributors to compile-time being GVN, DSE 
and LICM.

Top Statistic Changes for kimwitu++
  loop-idiom.NumMemSet 1.0 -> 3.0 +200.00%
  licm.NumLoadStorePromoted 27.0 -> 75.0 +177.78%
  dse.NumFastStores 129.0 -> 285.0 +120.93%
  dse.NumGetDomMemoryDefPassed 145.0 -> 301.0 +107.59%
  globalsmodref-aa.NumIndirectGlobalVars 1.0 -> 2.0 +100.00%
  licm.NumPromotionCandidates 121.0 -> 169.0 +39.67%
  capture-tracking.NumNotCapturedBefore 141.0 -> 189.0 +34.04%
  instsimplify.NumSimplified 494.0 -> 626.0 +26.72%
  gvn.NumGVNSimpl 701.0 -> 833.0 +18.83%
  gvn.NumPRELoadMoved2CEPred 13.0 -> 15.0 +15.38%
  lcssa.NumLCSSA 2820.0 -> 3252.0 +15.32%
  licm.NumMovedLoads 15.0 -> 17.0 +13.33%
  early-cse.NumCSE 1224.0 -> 1344.0 +9.80%
  gvn.IsValueFullyAvailableInBlockNumSpeculationsMax 114.0 -> 125.0 +9.65%
  early-cse.NumCSELoad 3055.0 -> 3291.0 +7.73%
  instcombine.NegatorMaxTotalValuesVisited 14.0 -> 15.0 +7.14%
  dse.NumDomMemDefChecks 6005.0 -> 6327.0 +5.36%


Another increase to highlight is clang build time (+0.13%). I didn’t collect 
stats for this one, but it looks like the extra optimizations make the stage2 
Clang about 0.08% faster on CTMark (stage1-O3 +0.01% vs stage2-O3 -0.07%), 
while doing a number of additional transformations.



https://github.com/llvm/llvm-project/pull/117244
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to