Steve Ellcey <sell...@marvell.com> 于2019年2月16日周六 上午1:53写道:
> On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote: > > > > ICC is doing much more than GCC in ipo, especially memory layout > > optimizations. See https://software.intel.com/en-us/node/522667. > > ICC is more aggressive in array transposition/structure splitting > > /field reordering. However, these optimizations have been removed > > from GCC long time ago. > > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > > most time-consuming. ICC will optimize the array(maybe structure?) > > and vectorize the loop under ipo. > > > > Thanks > > Jun > > Interesting. I tried using '-qno-opt-mem-layout-trans' on ICC > along with '-Ofast -ipo' and that had no affect on the speed. I also > tried '-no-vec' and that had no affect either. The only thing that > slowed down ICC was '-ip-no-inlining' or '-fno-inline'. I see that > '-Ofast -ipo' resulted in everything (except libc functions) getting > inlined into the main program when using ICC. GCC did not do that, but > if I forced it to by using the always_inline attribute, GCC could > inline everything into main the way ICC does. But that did not speed > up the GCC executable. > > Steve Ellcey > sell...@marvell.com you can use '-qopt-report' to see which optimizations has been applied by icc. Thanks Jun