Re: GCC missing -flto optimizations? SPEC lbm benchmark

Richard Biener Fri, 15 Feb 2019 05:14:17 -0800

On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel...@gmail.com> 
wrote:
>I never could understand, why field reordering was removed from GCC?


The implementation simply was seriously broken, bitrotten and unmaintained. 

Richard 

 I
>mean, I know that it's prohibited in C and C++, but, sure, GCC can
>detect whether it possibly can influence application behavior, and if
>not, just do the reorder.
>
>The veto is important to C/C++ as programming languages, but not to
>machine code that is being generated from them. As long as app can't
>detect that its fields were reordered through means defined by C/C++,
>field reordering by compiler is fine, isn't it?
>
>On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950...@gmail.com> wrote:
>>
>> Bin.Cheng <amker.ch...@gmail.com> 于2019年2月15日周五 下午5:12写道：
>>
>> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sell...@marvell.com>
>wrote:
>> > >
>> > > I have a question about SPEC CPU 2017 and what GCC can and cannot
>do
>> > > with -flto.  As part of some SPEC analysis I am doing I found
>that with
>> > > -Ofast, ICC and GCC were not that far apart (especially spec int
>rate,
>> > > spec fp rate was a slightly larger difference).
>> > >
>> > > But when I added -ipo to the ICC command and -flto to the GCC
>command,
>> > > the difference got larger.  In particular the 519.lbm_r was more
>than
>> > > twice as fast with ICC and -ipo, but -flto did not help GCC at
>all.
>> > >
>> > > There are other tests that also show this type of improvement
>with -ipo
>> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
>> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone
>have
>> > > any idea on what ICC is doing that GCC is missing?  Is GCC just
>not
>> > > agressive enough with its inlining?
>> >
>> > IIRC Jun did some investigation before? CCing.
>> >
>> > Thanks,
>> > bin
>> > >
>> > > Steve Ellcey
>> > > sell...@marvell.com
>>
>> ICC is doing much more than GCC in ipo, especially memory layout
>> optimizations. See https://software.intel.com/en-us/node/522667.
>> ICC is more aggressive in array transposition/structure splitting
>> /field reordering. However, these optimizations have been removed
>> from GCC long time ago.
>> As for case lbm_r, IIRC a loop with memory access which stride is 20
>is
>> most time-consuming.  ICC will optimize the array(maybe structure?)
>> and vectorize the loop under ipo.
>>
>> Thanks
>> Jun

Re: GCC missing -flto optimizations? SPEC lbm benchmark

Reply via email to