http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000
Summary: Major performance regression in parallel SSE2 impl of
SHA256 hash algorithm
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: major
Priority: P3
Component: c
AssignedTo: [email protected]
ReportedBy: [email protected]
Created attachment 22805
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22805
4-way SHA256 implementation, whose performance decreases markedly 4.4.x ->
4.5.x
OS: Fedora 14
My "cpuminer" open source project is -very- sensitive to performance of
generated code, and experiences a severe performance regression going from gcc
4.4.x to 4.5.x.
Our program core is essentially
for (n = 0; n < 0xffffff; n++)
sha256( sha256( data ) ) /* one iteration of inner loop */
Building with gcc 4.4.5 -or- Fedora 13 gcc (4.4.x derivative), we achieve
1850.85 kilo-iterations per second
Building with gcc 4.5.1 -or- Fedora 14 gcc (4.5.x derivative), we achieve
1389.82 kilo-iterations per second
This is a significant performance decrease, and the only variable is the
compiler. I have presented x86_64 data below, but similar slowdowns are seen
on i686-mingw in Fedora 13 (fast gcc 4.4.x) or Fedora 14 (slow gcc 4.5.x).
This interesting variant of the standard SHA256 algorithm is implemented using
Intel/AMD SSE2-specific operations, effectively running four (4) SHA256
iterations in parallel, generating four (4) SHA256 hashes on four distinct
datasets.
See attachment sha256_4way.i.
--------------------------------------------------------------------------
fast, working gcc -v:
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../src/gcc-4.4.5/configure --prefix=/garz/gcc44
--enable-languages=c
Thread model: posix
gcc version 4.4.5 (GCC)
--------------------------------------------------------------------------
slow, broken gcc -v:
Using built-in specs.
COLLECT_GCC=/garz/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/garz/gcc45/libexec/gcc/x86_64-unknown-linux-gnu/4.5.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../src/gcc-4.5.1/configure --prefix=/garz/gcc45
--enable-languages=c
Thread model: posix
gcc version 4.5.1 (GCC)