http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397
Bug #: 53397
Summary: Scimark performance drops by 10x times when compiled
-O3 -march=amdfam10 due to generation more prefecthes
Classification: Unclassified
Product: gcc
Version: tree-ssa
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: [email protected]
ReportedBy: [email protected]
With GCC4.7 the benchmark score drops from ~400 Mflops to ~40 mflops. Almost 10
folds.
Prefecth instructions introduced in the innermost loops of
"FFT_transform_internal" ( FFT.c ) in GCC4.7 but not in GCC4.6 which is causing
the slow down.
Compiling this function alone as a separate test case with
-fno-prefetch-loop-arrays brings back the original score.
The problem is exposed http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175474
With GCC r175473
--------------------------
gcc -O3 -march=amdfam10 *.c -o Scimark175473 -lm
vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175473
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to [email protected]) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 99.67
FFT Mflops: 498.35 (N=1024)
With GCC r175474
-------------------------
gcc -O3 -march=amdfam10 *.c -o Scimark175474 -lm
vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175474
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to [email protected]) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 7.73
FFT Mflops: 38.66 (N=1024)