http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #17 from bfriesen at simple dot dallas.tx.us 2012-07-21 01:04:55
UTC ---
I discovered that GCC's __attribute__((__optimize__())) and optimization
pragmas do not work for OpenMP code because OpenMP uses a different function
nam
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #16 from bfriesen at simple dot dallas.tx.us 2012-07-19 14:29:10
UTC ---
Is there a way that I can selectively apply the -frename-registers fix to
functions which benefit from it in order to work around the bug until the fix
is widely
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #15 from bfriesen at simple dot dallas.tx.us 2012-07-18 20:42:22
UTC ---
Testing shows that using
-m64 -march=native -O2 -mfpmath=sse -frename-registers
is sufficient to restore good performance.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #14 from bfriesen at simple dot dallas.tx.us 2012-07-18 14:28:04
UTC ---
With
-m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -funroll-loops
-fschedule-insns
I see a whole-program performance jump from 0.047 iter/s to 0.156 iter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #11 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:41:08
UTC ---
I just verified that -O3 produces similar timings to -O2 for both -mfpmath=387
and -mfpmath=sse
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #10 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:35:03
UTC ---
This particular application test was done with these options (i.e. -O2):
-m64 -mtune=generic -march=x86-64 -mfpmath=387 -O2
I have also tried -O3, with no
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #8 from bfriesen at simple dot dallas.tx.us 2012-07-16 14:16:46 UTC
---
I used -march=native in this case. It is interesting that this enabled AVX
(this particular CPU does support it).
To be clear, the problem also occurs with
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #6 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:42:38 UTC
---
Created attachment 27797
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27797
Pre-processed GraphicsMagick source (effect.c).
In case the small sam
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #5 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:06:27 UTC
---
Please note that while I mentioned GCC 4.6.2, the same problem is also observed
with GCC 4.7.1.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #4 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:58:59 UTC
---
Created attachment 27796
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27796
Generated assembler code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #3 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:57:58 UTC
---
Created attachment 27795
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27795
Pre-processed source
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #2 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:56:55 UTC
---
Created attachment 27794
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27794
Sample portable source file
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
--- Comment #1 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:55:48 UTC
---
Created attachment 27793
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27793
Build log
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
Bug #: 53967
Summary: GCC produces slow code for convolution algorithm with
-mfpmath=sse (the AMD_64 default)
Classification: Unclassified
Product: gcc
Version: 4.6.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35531
bfriesen at simple dot dallas.tx.us changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
15 matches
Mail list logo