https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #12 from Andrew Roberts <andrewm.roberts at sky dot com> --- Ok I've tried again with this weeks snapshot: gcc version 8.0.0 20171126 (experimental) (GCC) Taking combination of -march and -mtune which works well on Ryzen: /usr/local/gcc/bin/gcc -march=core-avx-i -mtune=nocona -O3 matrix.c -o matrix ./matrix mult took 131153 clocks Then switching to -mtune=znver1 /usr/local/gcc/bin/gcc -march=core-avx-i -mtune=znver1 -O3 matrix.c -o matrix ./matrix mult took 231309 clocks Then looking at the differences in the -Q --help=target output for these two and eliminating each difference at a time, I found that: gcc -march=core-avx-i -mtune=znver1 -mprefer-vector-width=none -O3 matrix.c -o matrix [aroberts@ryzen share]$ ./matrix mult took 132295 clocks The default for znver1 is: -mprefer-vector-width=128 So is this option still helping with the latest microcode? Not in this case at least. cat /proc/cpuinfo : processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 1 model name : AMD Ryzen 7 1700 Eight-Core Processor stepping : 1 microcode : 0x8001129 with -march=znver1 -mtune=znver1 with default of -mprefer-vector-width=128 mult took 386291 clocks with -march=znver1 -mtune=znver1 -mprefer-vector-width=none mult took 201455 clocks