http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58529
Tobias Burnus <burnus at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|middle-end |target Summary|Loop 30% faster with Intel |GCC -funroll-loops 150% |than with GCC |slower with -march=native | |on x86-64 --- Comment #9 from Tobias Burnus <burnus at gcc dot gnu.org> --- (In reply to Tobias Burnus from comment #8) > I have to re-check why unrolling made it slower on that Xeon E5-2630 > (comment 0) but faster on the i5. Seems to be a tuning problem. All timings on the Xeon E5-2630, but using the -march=native compile from the i5 vs. the -march=native compilation for the Xeon E5: real 1.530s user 1.528s sys 0.000s i5, no unrolling real 1.483s user 1.481s sys 0.000s Xeon, no unrolling real 0.937s user 0.934s sys 0.002s i5, -funroll-loops real 2.480s user 2.478s sys 0.000s Xeon, -funroll-loops real 0.935s user 0.934s sys 0.000s Xeon, -funroll-loops max-unroll-times=7 The i5's -march=native expands into: -march=core-avx-i -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=core-avx-i The Xeon's -march=native -march=corei7-avx -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=15360 -mtune=corei7-avx Namely: i5: -march=core-avx-i -mrdrnd -mf16c -mfsgsbase --param l2-cache-size=6144 -mtune=core-avx-i Xeon: -march=corei7-avx -mno-rdrnd -mno-f16c -mno-fsgsbase --param l2-cache-size=15360 -mtune=corei7-avx