subject:"\[Bug target\/81616\] Update \-mtune=generic for the current Intel and AMD processors"

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2022-12-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #55 from CVS Commits --- The master branch has been updated by Hongyu Wang : https://gcc.gnu.org/g:3a1a141f79c83ad38f7db3a21d8a4dcfe625c176 commit r13-4534-g3a1a141f79c83ad38f7db3a21d8a4dcfe625c176 Author: Hongyu Wang Date: Tue D

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2019-04-18 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Jan Hubicka changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2019-04-17 Thread jamborm at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #53 from Martin Jambor --- I'd vote for marking this fixed (and asking anyone with other ideas what could be improved in generic tuning to open a new bug).

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2019-04-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #52 from Richard Biener --- Fixed? Or shall we take it as recurring bug?

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-23 Thread jamborm at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #51 from Martin Jambor --- (In reply to Andrew Roberts from comment #50) > with the matrix.c benchmark on Ryzen and looking at the other options when > using -march=znver1 and -mtune=znver1 > > mult took 225281 clocks -march=znver1 -

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #50 from Andrew Roberts --- with the matrix.c benchmark on Ryzen and looking at the other options when using -march=znver1 and -mtune=znver1 mult took 225281 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128 mult took 1859

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #49 from Jan Hubicka --- > matrix.c is still needing additional options to get the best out of the Ryzen > processor. But is better than before (223029 clocks vs 371978 originally), > but 122677 is achievable with the right options.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #48 from Andrew Roberts --- Correction, that should be 23 not 23000 for the haswell drop in performance.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #47 from Andrew Roberts --- Again with the latest snapshot: gcc version 8.0.1 20180121 matrix.c is still needing additional options to get the best out of the Ryzen processor. But is better than before (223029 clocks vs 371978 origin

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #46 from Andrew Roberts --- With the latest snapshot: gcc version 8.0.1 20180121 For the mt19937ar things now look reasonable without any strange options on Ryzen. Top 5 mt19937ar took 226849 clocks -march=amdfam10 -mtune=btver2 mt1

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #45 from Jan Hubicka --- I believe all issues tracked here has been adressed. Andrew, do you still see some anomalies? Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-12 Thread jamborm at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #44 from Martin Jambor --- Author: jamborm Date: Fri Jan 12 14:06:10 2018 New Revision: 256581 URL: https://gcc.gnu.org/viewcvs?rev=256581&root=gcc&view=rev Log: Deferring FMA transformations in tight loops 2018-01-12 Martin Jambor

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-10 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #43 from Jan Hubicka --- Author: hubicka Date: Wed Jan 10 11:02:55 2018 New Revision: 256424 URL: https://gcc.gnu.org/viewcvs?rev=256424&root=gcc&view=rev Log: PR target/81616 * i386.c (ix86_vectorize_builtin_gather):

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-02 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #42 from Jan Hubicka --- Author: hubicka Date: Tue Jan 2 13:04:19 2018 New Revision: 256073 URL: https://gcc.gnu.org/viewcvs?rev=256073&root=gcc&view=rev Log: PR target/81616 * config/i386/x86-tune-costs.h: Increase

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-02 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #41 from Jan Hubicka --- Author: hubicka Date: Tue Jan 2 09:31:47 2018 New Revision: 256070 URL: https://gcc.gnu.org/viewcvs?rev=256070&root=gcc&view=rev Log: PR target/81616 * x86-tune-costs.h (generic_cost): Reduc

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-15 Thread jamborm at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #40 from Martin Jambor --- (In reply to Sebastian Peryt from comment #39) > I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any > regressions. I should have written that the patch only affects znver1 tuning by

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-14 Thread sebastian.peryt at intel dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #39 from Sebastian Peryt --- I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any regressions.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-13 Thread jamborm at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Martin Jambor changed: What|Removed |Added CC||jamborm at gcc dot gnu.org --- Comment #

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-04 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #37 from Jan Hubicka --- Author: hubicka Date: Mon Dec 4 23:59:11 2017 New Revision: 255395 URL: https://gcc.gnu.org/viewcvs?rev=255395&root=gcc&view=rev Log: PR target/81616 * athlon.md: Disable for generic.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-02 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #36 from Jan Hubicka --- Author: hubicka Date: Sat Dec 2 09:22:41 2017 New Revision: 255357 URL: https://gcc.gnu.org/viewcvs?rev=255357&root=gcc&view=rev Log: PR target/81616 * x86-tune.def: Remove obsolette FIXMEs.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-30 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #35 from Jan Hubicka --- Author: hubicka Date: Thu Nov 30 09:36:36 2017 New Revision: 255268 URL: https://gcc.gnu.org/viewcvs?rev=255268&root=gcc&view=rev Log: PR target/81616 * x86-tnue-costs.h (generic_cost): Revise

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #34 from Jan Hubicka --- > So gcc loses on mt19937ar.c without -mno-avx2 > But gcc wins big on matrix.c, especially with -mprefer-vector-width=none > -mno-fma It is because llvm does not use vgather at all unless avx512 is present.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #33 from Andrew Roberts --- That second llvm command line should read: /usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast mt19937ar.c -o mt19937ar

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #32 from Andrew Roberts --- For what its worth, here's what the latest and greatest from the competition has to offer: /usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix mult took 88714

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #31 from Andrew Roberts --- of for mt19937ar with -mno-avx2 /usr/local/gcc/bin/gcc -march=$amarch -mtune=$amtune -mno-avx2 -O3 -o mt199 37ar mt19937ar.c Top 2: mt19937ar took 358493 clocks -march=silvermont -mtune=bdver1 mt19937ar t

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #30 from Jan Hubicka --- Sorry, with -mno-avx2 I was speaking of the other mt benchmark. There is no need for gathers in matrix multiplication... Honza

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #29 from Andrew Roberts --- And rerunning all the tests for matrix.c on Ryzen using: -march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -O3 The winners were: mult took 118145 clocks -march=broadwell -mtune=broadwell mu

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #28 from Andrew Roberts --- Adding -mno-avx2 into the mix was a marginal win, but only just showing out of the noise: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none -mno-fma -mno-avx2 -O3 matrix.c -o ma

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #27 from Jan Hubicka --- Hi, one of problem here is use of vgather instruction. It is hardly a win on Zen architecture. It is also on my TODO to adjust the code model to disable it for most loops. I only want to benchmark if it is a

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #26 from Jan Hubicka --- On you matrix benchmarks I get: Vector inside of loop cost: 44 Vector prologue cost: 12 Vector epilogue cost: 0 Scalar iteration cost: 40 Scalar outside cost: 0 Vector outside cost: 12 prologue

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #25 from Jan Hubicka --- Hi, I agree that the matric multiplication fma issue is important and hopefully it will be fixed for GCC 8. See https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00437.html The irregularity of tune/arch is proba

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #24 from Andrew Roberts --- For the mt19937ar test: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 mt19937ar.c -o mt19937ar mt19937ar took 462062 clocks /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-wi

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #23 from Andrew Roberts --- Thanks Honza, getting closer, with original matrix.c on Ryzen: /usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix mult took 364850 clocks /usr/local/gcc/bin/gcc -march=

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #22 from Jan Hubicka --- Hi, this is same base (so you can see there is some noise) compared to haswell tuning 164.gzip 140057.12452* 140058.72384* 175.vpr 140037.13776*

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #21 from Jan Hubicka --- Hi, this is comparing SPEC2000 -Ofast -march=native -mprefer-vector-width=128 to -Ofast -march=native -mprefer-vector-width=256 on Ryzen. 168.wupwise 160028.25669* 160030.8518

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #20 from Andrew Roberts --- Again those latest mt19937ar results above were with the current snapshot: /usr/local/gcc/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexe

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #19 from Andrew Roberts --- Created attachment 42735 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42735&action=edit modified mt19937ar test program, test script and results modified mt19937ar test program, test script and res

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #18 from Andrew Roberts --- Ok trying an entirely different algorith, same results: Using Mersenne Twister algorithm from here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html alter main program to comment out

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #17 from Andrew Roberts --- The general consensus in userland is that the znver1 optimization is much worse than 0.5%, or even 2% off. Most people are using -march=haswell if they care about performance. Just taking one part of one o

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #16 from Richard Biener --- (In reply to Jan Hubicka from comment #13) > > So is this option still helping with the latest microcode? Not in this case > > at > > least. > > It is on my TODO list to re-benchmark 256bit vectorization

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #15 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 > > --- Comment #14 from Andrew Roberts --- > It would be nice if znver1 for -march and -mtune could be improved before the > gcc 8 release. At present -m

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #14 from Andrew Roberts --- It would be nice if znver1 for -march and -mtune could be improved before the gcc 8 release. At present -march=znver1 -mtune=znver1 looks be to about the worst thing you could do, and not just on this vecto

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread hubicka at ucw dot cz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #13 from Jan Hubicka --- > So is this option still helping with the latest microcode? Not in this case at > least. It is on my TODO list to re-benchmark 256bit vectorization for Zen. I do not think microcode is a big difference here

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-26 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #12 from Andrew Roberts --- Ok I've tried again with this weeks snapshot: gcc version 8.0.0 20171126 (experimental) (GCC) Taking combination of -march and -mtune which works well on Ryzen: /usr/local/gcc/bin/gcc -march=core-avx-i

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #11

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #10 from Andrew Roberts --- Created attachment 42691 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42691&action=edit Script for matrix.c test program Script for matrix.c test program

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #9 from Andrew Roberts --- Created attachment 42690 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42690&action=edit Test results for Skylake system with matrix.c Test results for Skylake system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #8 from Andrew Roberts --- Created attachment 42689 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42689&action=edit Test results for Haswell system with matrix.c Test results for Haswell system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #7 from Andrew Roberts --- Created attachment 42688 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42688&action=edit Test results for Ryzen system with matrix.c Test results for Ryzen system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #6 from Andrew Roberts --- Created attachment 42687 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42687&action=edit Test program used for the attached performance results (matrix.c) Test program used for the attached performan

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #5 from Andrew Roberts --- I've been testing on a Ryzen system and also comparing with Haswell and Skylake. From my testing -mtune=znver1 does not perform well and never has, including as of last snapshot: gcc version 8.0.0 20171119 (

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Andrew Roberts changed: What|Removed |Added CC||andrewm.roberts at sky dot com --- Comm

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-19 Thread hubicka at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #3 from Jan Hubicka

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-07-31 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Richard Biener changed: What|Removed |Added Target|x86 |x86_64-*-*, i?86-*-* Status

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-07-30 Thread hjl.tools at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 H.J. Lu changed: What|Removed |Added CC||cody at codygray dot com --- Comment #1 from H

55 matches

Mail list logo