https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #32 from Andrew Roberts <andrewm.roberts at sky dot com> --- For what its worth, here's what the latest and greatest from the competition has to offer: /usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix mult took 887141 clocks /usr/local/llvm-5.0.1-rc2/biznver1 -O3 mt19937ar.c -o mt19937ar mt19937ar took 402282 clocks /usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast matrix.c -o matrix mult took 760913 clocks /usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast mt19937ar.c -o mt19937ar mt19937ar took 392527 clocks current gcc-8 snapshot: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -Ofast matrix.c -o matrix mult took 364775 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -Ofast -o mt19937ar mt19937ar.c mt19937ar took 430804 clocks current gcc-8 snapshot + extra opts to improve znver1 performance /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -Ofast matrix.c -o matrix mult took 130329 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mno-avx2 -Ofast -o mt19937ar mt19937ar.c mt19937ar took 387728 clocks So gcc loses on mt19937ar.c without -mno-avx2 But gcc wins big on matrix.c, especially with -mprefer-vector-width=none -mno-fma