Performance comparison of gcc releases

Ronny Peine Thu, 15 Dec 2005 16:31:36 -0800

Hi,

i forgot to post the best cflags for each gcc-version and benchmark.
Here are the results:


gcc-3.3.6:
nbench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe 
-fforce-addr -fsched-spec-load -fmove-all-movables -ffast-math -ftracer 
-funroll-loops -funroll-all-loops -mfpmath=sse -momit-leaf-frame-pointer

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fmove-all-movables -freduce-all-givs -ftracer 
-funroll-all-loops -fprefetch-loop-arrays -mfpmath=sse 
-momit-leaf-frame-pointer

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fmove-all-movables -freduce-all-givs -funroll-loops 
-funroll-all-loops -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer


gcc-3.4.4:
nbench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -fmove-all-movables -ffast-math -funroll-loops 
-funroll-all-loops -fpeel-loops -fold-unroll-loops 
-fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -freduce-all-givs -ffast-math -ftracer -funroll-loops 
-funroll-all-loops -fpeel-loops -fold-unroll-loops -fold-unroll-all-loops 
-fbranch-target-load-optimize -fbranch-target-load-optimize2 -mfpmath=sse 
-mfpmath=sse,387 -momit-leaf-frame-pointer

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks 
-fsched2-use-traces -fmove-all-movables -freduce-all-givs -ftracer 
-funroll-loops -funroll-all-loops -fpeel-loops -fold-unroll-loops 
-fold-unroll-all-loops -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387 
-momit-leaf-frame-pointer


gcc-4.0.2:
nbench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr 
-fmodulo-sched -fgcse-sm -fgcse-las -fsched-spec-load -ftree-vectorize 
-ftracer -funroll-loops -fvariable-expansion-in-unroller 
-fprefetch-loop-arrays -freorder-blocks-and-partition -fweb -ffast-math 
-fmove-loop-invariants -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -momit-leaf-frame-pointer 
-D__NO_MATH_INLINES

freebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fmodulo-sched 
-fsched-spec-load -freschedule-modulo-scheduled-loops -ftree-vectorize 
-ftracer -funroll-loops -fvariable-expansion-in-unroller 
-fprefetch-loop-arrays -freorder-blocks-and-partition -fmove-loop-invariants 
-fbranch-target-load-optimize -fbranch-target-load-optimize2 
-fbtr-bb-exclusive -momit-leaf-frame-pointer -D__NO_MATH_INLINES

lamebench:
-s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fgcse-sm 
-fgcse-las -fsched-spec-load -fsched2-use-superblocks -fsched2-use-traces 
-freschedule-modulo-scheduled-loops -ftracer -funroll-loops 
-fvariable-expansion-in-unroller -freorder-blocks-and-partition -fweb 
-ffast-math -fpeel-loops -fmove-loop-invariants -fbranch-target-load-optimize 
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -mfpmath=sse 
-mfpmath=sse,387 -momit-leaf-frame-pointer -D__NO_MATH_INLINES


The time for one benchmark and one compiler takes from 6 to 48 hours and 
depends heavily on the given testingflags (the used algorithm for 
flagfiltering is O(n^2)).

The testingflags for each compiler is:

gcc-3.3.6:
TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fmove-all-movables|-freduce-all-givs|-ffast-math|
-ftracer|-funroll-loops|-funroll-all-loops|-fprefetch-loop-arrays|-mfpmath=sse|-mfpmath=sse,387|
-momit-leaf-frame-pointer"

gcc-3.4.4:
TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fsched2-use-superblocks|
-fsched2-use-superblocks -fsched2-use-traces|-fmove-all-movables|
-freduce-all-givs|-ffast-math|-ftracer|-funroll-loops|-funroll-all-loops|
-fpeel-loops|-fold-unroll-loops|-fold-unroll-all-loops|-fprefetch-loop-arrays|
-fbranch-target-load-optimize|-fbranch-target-load-optimize2|-mfpmath=sse|
-mfpmath=sse,387|-momit-leaf-frame-pointer"

gcc-4.0.2:
TESTINGFLAGS="-fforce-addr|-fmodulo-sched|-fgcse-sm|-fgcse-las|-fsched-spec-load|
-fsched2-use-superblocks -fsched2-use-traces|
-freschedule-modulo-scheduled-loops| -ftree-vectorize|
-ftracer|-funroll-loops|-fvariable-expansion-in-unroller|
-fprefetch-loop-arrays|-freorder-blocks-and-partition|-fweb|-ffast-math|-fpeel-loops|
-fmove-loop-invariants|-fbranch-target-load-optimize|-fbranch-target-load-optimize2|
-fbtr-bb-exclusive|-mfpmath=sse|-mfpmath=sse,387|-momit-leaf-frame-pointer|-D__NO_MATH_INLINES"

-ftree-loop-linear is removed from the testingflags in gcc-4.0.2 because it 
leads to an endless loop in neural net in nbench.

Performance comparison of gcc releases

Reply via email to