https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- > You may notice I was invoking the wrong executable in what I posted in > comment #3. I did rerun the correct one several times and tried it with > -mavx -mprefer-avx128. I get the same poor results regardless. Several things could go wrong here... If you run the benchmark under gdb and break, then type "disassemble $pc,$pc+200", do you actually end up in the right program part (the one with AVX instructions)? Or does your machine prefer AVX128? To find out, what are the timings for inline code using -mavx -Ofast -mavx -mprefer=avx128 -Ofast ?