On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel...@gmail.com> wrote: >I never could understand, why field reordering was removed from GCC?
The implementation simply was seriously broken, bitrotten and unmaintained. Richard I >mean, I know that it's prohibited in C and C++, but, sure, GCC can >detect whether it possibly can influence application behavior, and if >not, just do the reorder. > >The veto is important to C/C++ as programming languages, but not to >machine code that is being generated from them. As long as app can't >detect that its fields were reordered through means defined by C/C++, >field reordering by compiler is fine, isn't it? > >On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950...@gmail.com> wrote: >> >> Bin.Cheng <amker.ch...@gmail.com> 于2019年2月15日周五 下午5:12写道: >> >> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sell...@marvell.com> >wrote: >> > > >> > > I have a question about SPEC CPU 2017 and what GCC can and cannot >do >> > > with -flto. As part of some SPEC analysis I am doing I found >that with >> > > -Ofast, ICC and GCC were not that far apart (especially spec int >rate, >> > > spec fp rate was a slightly larger difference). >> > > >> > > But when I added -ipo to the ICC command and -flto to the GCC >command, >> > > the difference got larger. In particular the 519.lbm_r was more >than >> > > twice as fast with ICC and -ipo, but -flto did not help GCC at >all. >> > > >> > > There are other tests that also show this type of improvement >with -ipo >> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and >> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone >have >> > > any idea on what ICC is doing that GCC is missing? Is GCC just >not >> > > agressive enough with its inlining? >> > >> > IIRC Jun did some investigation before? CCing. >> > >> > Thanks, >> > bin >> > > >> > > Steve Ellcey >> > > sell...@marvell.com >> >> ICC is doing much more than GCC in ipo, especially memory layout >> optimizations. See https://software.intel.com/en-us/node/522667. >> ICC is more aggressive in array transposition/structure splitting >> /field reordering. However, these optimizations have been removed >> from GCC long time ago. >> As for case lbm_r, IIRC a loop with memory access which stride is 20 >is >> most time-consuming. ICC will optimize the array(maybe structure?) >> and vectorize the loop under ipo. >> >> Thanks >> Jun