On an AMD amdfam10 system, gcc 4.5 (713s) is 7% faster than gcc 4.6 (763s)
With the following settings:
4.6: gcc version 4.6.0 20100812 (experimental) (GCC)
FOPTIMIZE = -Ofast -funroll-all-loops -fno-tree-pre -mveclibabi=acml
-m64 -march=amdfam10
EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv
4.5: gcc version 4.5.2 20100818 (prerelease) (GCC)
COPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre
FOPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre
-mveclibabi=acml -m64 -march=amdfam10
EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv
NOTE that for gcc 4.6, "-Ofast" = "-O3 -ffast-math" and
"-fprefetch-loop-arrays" is turned on @ -O3.
Also acml4.4.0 is used for both tests.
--
Summary: CPU2006 434.zeusmp: gcc 4.6 7% regression from gcc 4.6
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: changpeng dot fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45390