Some GCC 4.1 benchmarks (Re: Thoughts on LLVM and LTO)

Jan Hubicka Tue, 22 Nov 2005 12:04:47 -0800

> 
> > Which is why i said "It's fine to say compile time performance of the
> > middle end portions ew may replace should be same or better".
> > 
> > And if you were to look right now, it's actually significantly better in
> > some cases :(
> 
> Can you prove this assertion?
> 
> Here is some data:
> http://people.redhat.com/dnovillo/spec2000.i686/gcc/global-build-secs_elapsed.html
> 
> And some more
> http://llvm.cs.uiuc.edu/testresults/X86/2005-11-01.html
> 
> I'm  not sure about accuracy, or versions of LLVM used, etc.
> 
> Although promising on some things (as Diego said), LLVM exectue and
> compile performance is a mixed bag.
> 
> It would probably be interesting to run SPEC or something else with icc
> IPO enabled, LLVM IPO enabled, and whatever gcc IMA support is
> available, to do a true comparison of where things stand. More data
> would be interesting.


I might try to produce bit more useful charts, but I've done some
testing of GCC 4.1 on SPEC and some of C++ testcases recently mostly
looking for regressions in GCC 4.1 release.  I didn't tested LLVM, but
did some ICC comparsion and testing both with and without our current
IMA so it gives rough idea.

I should note that comparison to ICC is not quite fair since it lacks
Opteron tunning I tested on, but I would say that we are in same
performance camp on SPECint with IMA (IMA contribute 3.3% to the result)
despite the fact that GCC IMA and IPA is very primitive.  This can be
just proof that SPECint is not best testcase for testing future IPA
implementations.  I also did some C++ results that are a lot more wild.
It would be really interesting to see how much benefits one can see on
compiling full blown application and how large stuff one can hope to
compile with LTO (ie GCC/kernel/mozilla/OOo/... ;).

I am not quite sure how much of SPECfp loss can be contributed to IMA,
since I would expect it to more come from Fotran tunning.  Only
regressing C benchmark is ART that ineed needs whole program
optimization to allow datastructure layout changes.  Obviously we did
some notable progress on fortran perofrmance in between 4.0 and 4.1 and
none of that is IPA related.

I am also adding some scores of C++ testcases - tramp3d that has single
file and Gerald's application I didn't actually managed to merge into
single file, but I combined the files that appear hot in coverage.

Concerning compile time at -O2 hammer branch needs 185s, 4.0 192s, 4.1
205s With IPA and no FDO 4.0 needs 193s when patches by Andrew's faster
typemerging patch, 4.1 needs 218s.  I didn't recorded ICC compilation
times, but it clearly show that we are making compile time problems
worse with 4.1 again overall.  It also shows that IPA is cheap right,
but just because it is so primitive.  It is also cheap only as long as
you fit in memory (You need over 512MB of memory to build SPEC with IMA
on GCC that is far from acceptable)

Also note that eon and fortran files are not compiled with IMA in GCC
tests.

-O2, no IMA on both compilers:
        GCC-3.3-hammer  GCC 4.0 GCC 4.1 ICC-9.0
gzip    1162            1181    1199    1151
vpr     859             853     824     854
gcc     1057            1035    1028    963
mcf     540             540     541     543
crafty  2100            2041    2025    2106
parser  776             790     783     778
eon     1793            1874    1952    (failed, substituted as 783 for geomavg)
perlbmk 1407            1453    1438    1503
gap     1095            1152    1156    1071
vortex  1689            1663    1666    1618
bzip2   1009            1011    1000    997
twolf   843             858     852     823
geomavg 1114.8          1124.95 1122.76 1102

        GCC-3.3-hammer  GCC 4.0 GCC 4.1 ICC-9.0
wupwise 1218            1079    1304    1278
swim    1038            1065    1070    1064
mgrid   784             728     906     909
applu   772             822     840     884
mesa    1536            1609    1536    1486
galgel                  803     830     
art     730             739     735     747
equake  1102            1085    1069    1055
facerec                 905     914     1393
ammp    967             993     1008    985
lucas                   1106    1113    1264
fma3d                   976     978     1154
sixtrac 582             591     618     647
apsi    810             922     1004    948
                        933     971     1016

-O2 -static --combine -fwhole-program  -fipa-cp
versus ICC -xW -O3 -ipo -vec_report3
profile feedback is used on both compilers.
        GCC-3.3-hammer  GCC 4.0 GCC-4.1 ICC-9.0
gzip    1269            1299    1264    1337
vpr     890             864     885     869
gcc     1112            1095    1175    1023
mcf     539             536     538     546
crafty  2055            2034    2236    2301
parser  960             975     993     851
eon     2081            1928    2192    2150
perlbmk 1621            1574    1697    1652
gap     1117            1181    1223    1224
vortex  1683            2038    2173    2421
bzip2   1058            1022    1085    1087
twolf   842             877     877     849
        1183.41         1195.84 1251.55 1232.97


        GCC-3.3-hammer  GCC 4.0 GCC 4.1 ICC-9.0
wupwise                 1305    1401    1678
swim                    1065    1293    1360
mgrid                   758     884     973
applu                   857     918     1060
mesa    1756            1751    1756    1759
galgel                  818     848     1790
art     724             734     735     1414
equake  1088            1101    1108    1308
facerec                 974     1110    1467
ammp    1008            1034    1063    967
lucas                   1111    1104    1261
fma3d                   976     1215    1238
sixtrac                 643     702     653
apsi                    940     988     958
                        973.82  1049.12 1234.02


Tramp3d, iterations per seccond with and without FDO.
GCC 3.3-hammer  0.36
GCC 4.0         0.45
GCC 4.1         0.56
GCC 4.1 flatten 0.62
GCC 4.1 profile 0.07
GCC 4.1 FDO     0.81
GCC 4.1 profile 0.08
4.1 FDO flatten 0.89
ICC 9.0         0.14


DLV, speedup in percents relative to GCC 3.3 hammer-branch
                GCC 4.0 GCC 4.1 GCC-4.1 profile ICC 9.0
STRATCOMP1-ALL  284     287.1   242.86          18.52
STRATCOMP-770.2-6.25    0       13.33           -10.53
2QBF1           -5.47   -5.87   6.83            -15.23
PRIMEIMPL2      3.09    5.26    12.36           -23.95
3COL-SIMPLEX1   -1.78   -7.78   2.47            9.21
3COL-RANDOM1    -3.88   -0.84   0.21            -20.84
HP-RANDOM1      -26.72  -13.83  -12.45          -9.94
HAMCYCLE-FREE   -1.89   -3.7    0               -17.46
DECOMP2         -6.84   -12.2   -12.35          -11.27
BW-P5-nopush    -6.29   -4.07   -2.75           -5.98
BW-P5-pushbin   -5.28   -1.95   -0.4            -13.75
BW-P5-nopushbin -6.49   -2.7    0               -8.86
HANOI-Towers    -6.79   -2.58   0               -21.35
RAMSEY          5.41    -3.7    9.86            -5.65
CRISTAL         -17.21  -20.12  -13.53          -8.91
21-QUEENS       -1.71   -2.55   4.24            -34.48
MSTDir[V=13]    2.06    0.2     6               -31.72
MSTDir[V=15]    1.84    1.01    6.87            -32.15
MSTUndir[V=13]  -4.08   -4.08   2.92            -29.5
TIMETABLING     2.65    0.74    7.97            -31.91
AVG             2.71    2.6     7.74            -16.31

Some GCC 4.1 benchmarks (Re: Thoughts on LLVM and LTO)

Reply via email to