https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #12 from Andrew Roberts <andrewm.roberts at sky dot com> ---
Ok I've tried again with this weeks snapshot:

gcc version 8.0.0 20171126 (experimental) (GCC) 

Taking combination of -march and -mtune which works well on Ryzen:

/usr/local/gcc/bin/gcc -march=core-avx-i -mtune=nocona -O3 matrix.c -o matrix
./matrix
mult took     131153 clocks

Then switching to -mtune=znver1

/usr/local/gcc/bin/gcc -march=core-avx-i -mtune=znver1 -O3 matrix.c -o matrix
./matrix
 mult took     231309 clocks

Then looking at the differences in the -Q --help=target output for these two
and eliminating each difference at a time, I found that:

gcc -march=core-avx-i -mtune=znver1 -mprefer-vector-width=none -O3 matrix.c -o
matrix
[aroberts@ryzen share]$ ./matrix
mult took     132295 clocks

The default for znver1 is: -mprefer-vector-width=128

So is this option still helping with the latest microcode? Not in this case at
least.

cat /proc/cpuinfo : 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD Ryzen 7 1700 Eight-Core Processor
stepping        : 1
microcode       : 0x8001129

with -march=znver1 -mtune=znver1
with default of -mprefer-vector-width=128
mult took     386291 clocks

with -march=znver1 -mtune=znver1 -mprefer-vector-width=none
mult took     201455 clocks

Reply via email to