https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #55 from CVS Commits ---
The master branch has been updated by Hongyu Wang :
https://gcc.gnu.org/g:3a1a141f79c83ad38f7db3a21d8a4dcfe625c176
commit r13-4534-g3a1a141f79c83ad38f7db3a21d8a4dcfe625c176
Author: Hongyu Wang
Date: Tue D
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Jan Hubicka changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #53 from Martin Jambor ---
I'd vote for marking this fixed (and asking anyone with other ideas what could
be improved in generic tuning to open a new bug).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #52 from Richard Biener ---
Fixed? Or shall we take it as recurring bug?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #51 from Martin Jambor ---
(In reply to Andrew Roberts from comment #50)
> with the matrix.c benchmark on Ryzen and looking at the other options when
> using -march=znver1 and -mtune=znver1
>
> mult took 225281 clocks -march=znver1 -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #50 from Andrew Roberts ---
with the matrix.c benchmark on Ryzen and looking at the other options when
using -march=znver1 and -mtune=znver1
mult took 225281 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128
mult took 1859
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #49 from Jan Hubicka ---
> matrix.c is still needing additional options to get the best out of the Ryzen
> processor. But is better than before (223029 clocks vs 371978 originally),
> but 122677 is achievable with the right options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #48 from Andrew Roberts ---
Correction, that should be 23 not 23000 for the haswell drop in
performance.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #47 from Andrew Roberts ---
Again with the latest snapshot:
gcc version 8.0.1 20180121
matrix.c is still needing additional options to get the best out of the Ryzen
processor. But is better than before (223029 clocks vs 371978 origin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #46 from Andrew Roberts ---
With the latest snapshot:
gcc version 8.0.1 20180121
For the mt19937ar things now look reasonable without any strange options on
Ryzen.
Top 5
mt19937ar took 226849 clocks -march=amdfam10 -mtune=btver2
mt1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #45 from Jan Hubicka ---
I believe all issues tracked here has been adressed. Andrew, do you still see
some anomalies?
Honza
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #44 from Martin Jambor ---
Author: jamborm
Date: Fri Jan 12 14:06:10 2018
New Revision: 256581
URL: https://gcc.gnu.org/viewcvs?rev=256581&root=gcc&view=rev
Log:
Deferring FMA transformations in tight loops
2018-01-12 Martin Jambor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #43 from Jan Hubicka ---
Author: hubicka
Date: Wed Jan 10 11:02:55 2018
New Revision: 256424
URL: https://gcc.gnu.org/viewcvs?rev=256424&root=gcc&view=rev
Log:
PR target/81616
* i386.c (ix86_vectorize_builtin_gather):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #42 from Jan Hubicka ---
Author: hubicka
Date: Tue Jan 2 13:04:19 2018
New Revision: 256073
URL: https://gcc.gnu.org/viewcvs?rev=256073&root=gcc&view=rev
Log:
PR target/81616
* config/i386/x86-tune-costs.h: Increase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #41 from Jan Hubicka ---
Author: hubicka
Date: Tue Jan 2 09:31:47 2018
New Revision: 256070
URL: https://gcc.gnu.org/viewcvs?rev=256070&root=gcc&view=rev
Log:
PR target/81616
* x86-tune-costs.h (generic_cost): Reduc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #40 from Martin Jambor ---
(In reply to Sebastian Peryt from comment #39)
> I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any
> regressions.
I should have written that the patch only affects znver1 tuning by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #39 from Sebastian Peryt ---
I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any
regressions.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Martin Jambor changed:
What|Removed |Added
CC||jamborm at gcc dot gnu.org
--- Comment #
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #37 from Jan Hubicka ---
Author: hubicka
Date: Mon Dec 4 23:59:11 2017
New Revision: 255395
URL: https://gcc.gnu.org/viewcvs?rev=255395&root=gcc&view=rev
Log:
PR target/81616
* athlon.md: Disable for generic.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #36 from Jan Hubicka ---
Author: hubicka
Date: Sat Dec 2 09:22:41 2017
New Revision: 255357
URL: https://gcc.gnu.org/viewcvs?rev=255357&root=gcc&view=rev
Log:
PR target/81616
* x86-tune.def: Remove obsolette FIXMEs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #35 from Jan Hubicka ---
Author: hubicka
Date: Thu Nov 30 09:36:36 2017
New Revision: 255268
URL: https://gcc.gnu.org/viewcvs?rev=255268&root=gcc&view=rev
Log:
PR target/81616
* x86-tnue-costs.h (generic_cost): Revise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #34 from Jan Hubicka ---
> So gcc loses on mt19937ar.c without -mno-avx2
> But gcc wins big on matrix.c, especially with -mprefer-vector-width=none
> -mno-fma
It is because llvm does not use vgather at all unless avx512 is present.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #33 from Andrew Roberts ---
That second llvm command line should read:
/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast
mt19937ar.c -o mt19937ar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #32 from Andrew Roberts ---
For what its worth, here's what the latest and greatest from the competition
has to offer:
/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -O3 matrix.c -o
matrix
mult took 88714
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #31 from Andrew Roberts ---
of for mt19937ar with -mno-avx2
/usr/local/gcc/bin/gcc -march=$amarch -mtune=$amtune -mno-avx2 -O3 -o mt199
37ar mt19937ar.c
Top 2:
mt19937ar took 358493 clocks -march=silvermont -mtune=bdver1
mt19937ar t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #30 from Jan Hubicka ---
Sorry, with -mno-avx2 I was speaking of the other mt benchmark. There is no
need for gathers
in matrix multiplication...
Honza
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #29 from Andrew Roberts ---
And rerunning all the tests for matrix.c on Ryzen using:
-march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -O3
The winners were:
mult took 118145 clocks -march=broadwell -mtune=broadwell
mu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #28 from Andrew Roberts ---
Adding -mno-avx2 into the mix was a marginal win, but only just showing out of
the noise:
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -mno-avx2 -O3 matrix.c -o ma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #27 from Jan Hubicka ---
Hi,
one of problem here is use of vgather instruction. It is hardly a win on Zen
architecture.
It is also on my TODO to adjust the code model to disable it for most loops. I
only want
to benchmark if it is a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #26 from Jan Hubicka ---
On you matrix benchmarks I get:
Vector inside of loop cost: 44
Vector prologue cost: 12
Vector epilogue cost: 0
Scalar iteration cost: 40
Scalar outside cost: 0
Vector outside cost: 12
prologue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #25 from Jan Hubicka ---
Hi,
I agree that the matric multiplication fma issue is important and hopefully it
will be fixed for GCC 8. See
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00437.html
The irregularity of tune/arch is proba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #24 from Andrew Roberts ---
For the mt19937ar test:
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 mt19937ar.c -o mt19937ar
mt19937ar took 462062 clocks
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-wi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #23 from Andrew Roberts ---
Thanks Honza,
getting closer, with original matrix.c on Ryzen:
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix
mult took 364850 clocks
/usr/local/gcc/bin/gcc -march=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #22 from Jan Hubicka ---
Hi,
this is same base (so you can see there is some noise) compared to haswell
tuning
164.gzip 140057.12452* 140058.72384*
175.vpr 140037.13776*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #21 from Jan Hubicka ---
Hi,
this is comparing SPEC2000 -Ofast -march=native -mprefer-vector-width=128
to -Ofast -march=native -mprefer-vector-width=256 on Ryzen.
168.wupwise 160028.25669* 160030.8518
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #20 from Andrew Roberts ---
Again those latest mt19937ar results above were with the current snapshot:
/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #19 from Andrew Roberts ---
Created attachment 42735
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42735&action=edit
modified mt19937ar test program, test script and results
modified mt19937ar test program, test script and res
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #18 from Andrew Roberts ---
Ok trying an entirely different algorith, same results:
Using Mersenne Twister algorithm from here:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
alter main program to comment out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #17 from Andrew Roberts ---
The general consensus in userland is that the znver1 optimization is much worse
than 0.5%, or even 2% off. Most people are using -march=haswell if they care
about performance.
Just taking one part of one o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #16 from Richard Biener ---
(In reply to Jan Hubicka from comment #13)
> > So is this option still helping with the latest microcode? Not in this case
> > at
> > least.
>
> It is on my TODO list to re-benchmark 256bit vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #15 from Jan Hubicka ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
>
> --- Comment #14 from Andrew Roberts ---
> It would be nice if znver1 for -march and -mtune could be improved before the
> gcc 8 release. At present -m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #14 from Andrew Roberts ---
It would be nice if znver1 for -march and -mtune could be improved before the
gcc 8 release. At present -march=znver1 -mtune=znver1 looks be to about the
worst thing you could do, and not just on this vecto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #13 from Jan Hubicka ---
> So is this option still helping with the latest microcode? Not in this case at
> least.
It is on my TODO list to re-benchmark 256bit vectorization for Zen. I do not
think microcode is a big difference here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #12 from Andrew Roberts ---
Ok I've tried again with this weeks snapshot:
gcc version 8.0.0 20171126 (experimental) (GCC)
Taking combination of -march and -mtune which works well on Ryzen:
/usr/local/gcc/bin/gcc -march=core-avx-i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #10 from Andrew Roberts ---
Created attachment 42691
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42691&action=edit
Script for matrix.c test program
Script for matrix.c test program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #9 from Andrew Roberts ---
Created attachment 42690
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42690&action=edit
Test results for Skylake system with matrix.c
Test results for Skylake system with matrix.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #8 from Andrew Roberts ---
Created attachment 42689
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42689&action=edit
Test results for Haswell system with matrix.c
Test results for Haswell system with matrix.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #7 from Andrew Roberts ---
Created attachment 42688
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42688&action=edit
Test results for Ryzen system with matrix.c
Test results for Ryzen system with matrix.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #6 from Andrew Roberts ---
Created attachment 42687
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42687&action=edit
Test program used for the attached performance results (matrix.c)
Test program used for the attached performan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #5 from Andrew Roberts ---
I've been testing on a Ryzen system and also comparing with Haswell and
Skylake. From my testing -mtune=znver1 does not perform well and never has,
including as of last snapshot:
gcc version 8.0.0 20171119 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Andrew Roberts changed:
What|Removed |Added
CC||andrewm.roberts at sky dot com
--- Comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Jan Hubicka changed:
What|Removed |Added
Status|NEW |ASSIGNED
--- Comment #3 from Jan Hubicka
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
Richard Biener changed:
What|Removed |Added
Target|x86 |x86_64-*-*, i?86-*-*
Status
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
H.J. Lu changed:
What|Removed |Added
CC||cody at codygray dot com
--- Comment #1 from H
55 matches
Mail list logo