>I have no problems on -mtune=Bulldozer. But I object -mtune=generic >change and did suggest a different approach for -mtune=generic.
Something must have been broken for the unaligned load splitting in generic mode. While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for -mtune=bdver1, splitting unaligned loads in generic mode is KILLING us: For 459.GemsFDTD (ref) on Bulldozer, -Ofast -mavx -mno-avx256-split-unaligned-load: 480s -Ofast -mavx : 2527s So, splitting unaligned loads results in the program to run 5~6 times slower! For 434.zeusmp train run -Ofast -mavx -mno-avx256-split-unaligned-load: 32.5s -Ofast -mavx : 106s Other tests are on-going! Changpeng.