[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-20 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #17 from bfriesen at simple dot dallas.tx.us 2012-07-21 01:04:55 UTC --- I discovered that GCC's __attribute__((__optimize__())) and optimization pragmas do not work for OpenMP code because OpenMP uses a different function nam

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-19 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #16 from bfriesen at simple dot dallas.tx.us 2012-07-19 14:29:10 UTC --- Is there a way that I can selectively apply the -frename-registers fix to functions which benefit from it in order to work around the bug until the fix is widely

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-18 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #15 from bfriesen at simple dot dallas.tx.us 2012-07-18 20:42:22 UTC --- Testing shows that using -m64 -march=native -O2 -mfpmath=sse -frename-registers is sufficient to restore good performance.

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-18 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #14 from bfriesen at simple dot dallas.tx.us 2012-07-18 14:28:04 UTC --- With -m64 -mtune=generic -march=x86-64 -mfpmath=sse -O2 -funroll-loops -fschedule-insns I see a whole-program performance jump from 0.047 iter/s to 0.156 iter

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-16 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #11 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:41:08 UTC --- I just verified that -O3 produces similar timings to -O2 for both -mfpmath=387 and -mfpmath=sse

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-16 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #10 from bfriesen at simple dot dallas.tx.us 2012-07-16 15:35:03 UTC --- This particular application test was done with these options (i.e. -O2): -m64 -mtune=generic -march=x86-64 -mfpmath=387 -O2 I have also tried -O3, with no

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-16 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #8 from bfriesen at simple dot dallas.tx.us 2012-07-16 14:16:46 UTC --- I used -march=native in this case. It is interesting that this enabled AVX (this particular CPU does support it). To be clear, the problem also occurs with

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #6 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:42:38 UTC --- Created attachment 27797 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27797 Pre-processed GraphicsMagick source (effect.c). In case the small sam

[Bug target/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #5 from bfriesen at simple dot dallas.tx.us 2012-07-14 21:06:27 UTC --- Please note that while I mentioned GCC 4.6.2, the same problem is also observed with GCC 4.7.1.

[Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #4 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:58:59 UTC --- Created attachment 27796 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27796 Generated assembler code

[Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #3 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:57:58 UTC --- Created attachment 27795 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27795 Pre-processed source

[Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #2 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:56:55 UTC --- Created attachment 27794 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27794 Sample portable source file

[Bug c/53967] GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 --- Comment #1 from bfriesen at simple dot dallas.tx.us 2012-07-14 20:55:48 UTC --- Created attachment 27793 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27793 Build log

[Bug c/53967] New: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default)

2012-07-14 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967 Bug #: 53967 Summary: GCC produces slow code for convolution algorithm with -mfpmath=sse (the AMD_64 default) Classification: Unclassified Product: gcc Version: 4.6.2

[Bug bootstrap/35531] Assembler failure while compiling libgcc

2010-12-08 Thread bfriesen at simple dot dallas.tx.us
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35531 bfriesen at simple dot dallas.tx.us changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED