fhahn wrote: I'd also like to share a bit more info about the expected compile-time impact of the change: * stage1-O3 0.01%, * stage1-ReleaseThinLTO +0.04%, * stage1-ReleaseLTO-g +0.07% , * stage2-O3 -0.07% * clang build time +0.13%
Full data: https://llvm-compile-time-tracker.com/compare.php?from=a62e1c8eddcda420abec57976dc48f97669277dc&to=570c7e85ef7f3479969f9f970a0086ff9c04036f&stat=instructions TLDR: While there are some increase in compile-time, those are caused by a number of additional optimizations. The increase in clang build time yields a slightly faster stage2 clang while doing more optimizations. Lets take a closer look at some of the increases in compile-time. The statistics data below is for ARM64 macOS not X86 Linux like the compile-time tracker, but most impact is from IR optimizations, so I don’t expect the data to be vastly different. For stage1-O3, the biggest increase is Bullet +0.39%. This boils down to being able to apply more transformations, most notably probably +70% more loops being vectorized, which itself adds compile-time, as well as adding extra code to simplify. Top Statistic Changes for Bullet loop-simplifycfg.NumLoopBlocksDeleted 6.0 -> 26.0 +333.33% aarch64-ccmp.NumImmRangeRejs 66.0 -> 131.0 +98.48% loop-vectorize.LoopsVectorized 94.0 -> 160.0 +70.21% dse.NumRedundantStores 25.0 -> 39.0 +56.00% aarch64-ldst-opt.NumPostFolded 228.0 -> 355.0 +55.70% codegenprepare.NumPHIsElim 17.0 -> 25.0 +47.06% loop-simplifycfg.NumTerminatorsFolded 9.0 -> 13.0 +44.44% correlated-value-propagation.NumPhis 306.0 -> 423.0 +38.24% dagcombine.PostIndexedNodes 343.0 -> 473.0 +37.90% loop-idiom.NumMemSet 51.0 -> 66.0 +29.41% For stage1-ReleaseLTO-g the biggest increase is kimwitu++ +0.7%. The input bitcode size after merging all modules before LTO is the same. Looking at the impact on the transformations, we optimize quite a bit more around memory instructions, so likely the main contributors to compile-time being GVN, DSE and LICM. Top Statistic Changes for kimwitu++ loop-idiom.NumMemSet 1.0 -> 3.0 +200.00% licm.NumLoadStorePromoted 27.0 -> 75.0 +177.78% dse.NumFastStores 129.0 -> 285.0 +120.93% dse.NumGetDomMemoryDefPassed 145.0 -> 301.0 +107.59% globalsmodref-aa.NumIndirectGlobalVars 1.0 -> 2.0 +100.00% licm.NumPromotionCandidates 121.0 -> 169.0 +39.67% capture-tracking.NumNotCapturedBefore 141.0 -> 189.0 +34.04% instsimplify.NumSimplified 494.0 -> 626.0 +26.72% gvn.NumGVNSimpl 701.0 -> 833.0 +18.83% gvn.NumPRELoadMoved2CEPred 13.0 -> 15.0 +15.38% lcssa.NumLCSSA 2820.0 -> 3252.0 +15.32% licm.NumMovedLoads 15.0 -> 17.0 +13.33% early-cse.NumCSE 1224.0 -> 1344.0 +9.80% gvn.IsValueFullyAvailableInBlockNumSpeculationsMax 114.0 -> 125.0 +9.65% early-cse.NumCSELoad 3055.0 -> 3291.0 +7.73% instcombine.NegatorMaxTotalValuesVisited 14.0 -> 15.0 +7.14% dse.NumDomMemDefChecks 6005.0 -> 6327.0 +5.36% Another increase to highlight is clang build time (+0.13%). I didn’t collect stats for this one, but it looks like the extra optimizations make the stage2 Clang about 0.08% faster on CTMark (stage1-O3 +0.01% vs stage2-O3 -0.07%), while doing a number of additional transformations. https://github.com/llvm/llvm-project/pull/117244 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits