On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <mar...@trippelsdorf.de> wrote: > On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote: >> On 2016.09.22 at 15:36 +0200, Richard Biener wrote: >> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <wilco.dijks...@arm.com> >> > wrote: >> > > Increase the lto-min-partition size to 50000 to reduce the number of >> > > partitions. >> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a >> > > concise >> > > explanation why 10000 is too small for modern CPU/memory size. >> > > Additionally, >> > > larger values increase optimization opportunities and reduce bad >> > > decisions in the >> > > layout of global variables across partitions (anchors do not work well >> > > with LTO). >> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition >> > > which >> > > is the most optimal. Build time with LTO increases only slightly, eg. >> > > SPEC2006 >> > > now takes 2% more time on an 8-core ARM server. >> > >> > Ok. Marcus, how many partitions do we get with libreoffice/firefox >> > currently >> > (I suppose they all hit lto-max-partition now?) >> >> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets >> reduced to 20. >> And I guess bigger projects like Firefox are unchanged at 32. > > Sorry I've reported wrong numbers above. > > lto-min-partition was already increased from 1000 to 10000 on trunk by > Prathamesh in April.
Ah, I forgot about this. 10000 is equal to large-unit-insns btw and about four times of large-function-insns. > And tramp3d only uses ten partitions (lto-min-partition=10000). > With lto-min-partition=50000 (current patch) this decrease to only two > partitions. As a result we loose the possible speedup on many core > machines (-flto=n). > > E.g. on my 4-core machine I get the following tramp3d compile times with > -flto=4: > > lto-min-partition=50000: 20.146 total > lto-min-partition=10000: 16.299 total > lto-min-partition=1000 : 16.093 total > > So 50000 looks too big to me. I think the issue is that the default number of partitions is too high (32) which pessimizes 4-core machines if the units are too small. Maybe we can tune the triplet lto-partitions, lto-min-partition and lto-max-partition in a way that it roughly scales the number of partitions produced with program size rather than quickly raising to 32 and then hovering there until the first unit hits lto-max-partition? > Also the "increased optimization opportunities" with fewer partitions > were unmeasurable in the past. If I recall correctly Honza once said > that there should be no difference between single vs. many partitions. Well, it definitely makes a difference for late IPA passes (that's mainly IPA PTA). Richard. > -- > Markus