Hi, i forgot to post the best cflags for each gcc-version and benchmark. Here are the results:
gcc-3.3.6: nbench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fmove-all-movables -ffast-math -ftracer -funroll-loops -funroll-all-loops -mfpmath=sse -momit-leaf-frame-pointer freebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fmove-all-movables -freduce-all-givs -ftracer -funroll-all-loops -fprefetch-loop-arrays -mfpmath=sse -momit-leaf-frame-pointer lamebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fmove-all-movables -freduce-all-givs -funroll-loops -funroll-all-loops -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer gcc-3.4.4: nbench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks -fsched2-use-traces -fmove-all-movables -ffast-math -funroll-loops -funroll-all-loops -fpeel-loops -fold-unroll-loops -fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387 freebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks -fsched2-use-traces -freduce-all-givs -ffast-math -ftracer -funroll-loops -funroll-all-loops -fpeel-loops -fold-unroll-loops -fold-unroll-all-loops -fbranch-target-load-optimize -fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer lamebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fsched-spec-load -fsched2-use-superblocks -fsched2-use-superblocks -fsched2-use-traces -fmove-all-movables -freduce-all-givs -ftracer -funroll-loops -funroll-all-loops -fpeel-loops -fold-unroll-loops -fold-unroll-all-loops -fbranch-target-load-optimize -fbranch-target-load-optimize2 -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer gcc-4.0.2: nbench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fforce-addr -fmodulo-sched -fgcse-sm -fgcse-las -fsched-spec-load -ftree-vectorize -ftracer -funroll-loops -fvariable-expansion-in-unroller -fprefetch-loop-arrays -freorder-blocks-and-partition -fweb -ffast-math -fmove-loop-invariants -fbranch-target-load-optimize -fbranch-target-load-optimize2 -fbtr-bb-exclusive -momit-leaf-frame-pointer -D__NO_MATH_INLINES freebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fmodulo-sched -fsched-spec-load -freschedule-modulo-scheduled-loops -ftree-vectorize -ftracer -funroll-loops -fvariable-expansion-in-unroller -fprefetch-loop-arrays -freorder-blocks-and-partition -fmove-loop-invariants -fbranch-target-load-optimize -fbranch-target-load-optimize2 -fbtr-bb-exclusive -momit-leaf-frame-pointer -D__NO_MATH_INLINES lamebench: -s -static -O3 -march=athlon-xp -fomit-frame-pointer -pipe -fgcse-sm -fgcse-las -fsched-spec-load -fsched2-use-superblocks -fsched2-use-traces -freschedule-modulo-scheduled-loops -ftracer -funroll-loops -fvariable-expansion-in-unroller -freorder-blocks-and-partition -fweb -ffast-math -fpeel-loops -fmove-loop-invariants -fbranch-target-load-optimize -fbranch-target-load-optimize2 -fbtr-bb-exclusive -mfpmath=sse -mfpmath=sse,387 -momit-leaf-frame-pointer -D__NO_MATH_INLINES The time for one benchmark and one compiler takes from 6 to 48 hours and depends heavily on the given testingflags (the used algorithm for flagfiltering is O(n^2)). The testingflags for each compiler is: gcc-3.3.6: TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fmove-all-movables|-freduce-all-givs|-ffast-math| -ftracer|-funroll-loops|-funroll-all-loops|-fprefetch-loop-arrays|-mfpmath=sse|-mfpmath=sse,387| -momit-leaf-frame-pointer" gcc-3.4.4: TESTINGFLAGS="-fforce-addr|-fsched-spec-load|-fsched2-use-superblocks| -fsched2-use-superblocks -fsched2-use-traces|-fmove-all-movables| -freduce-all-givs|-ffast-math|-ftracer|-funroll-loops|-funroll-all-loops| -fpeel-loops|-fold-unroll-loops|-fold-unroll-all-loops|-fprefetch-loop-arrays| -fbranch-target-load-optimize|-fbranch-target-load-optimize2|-mfpmath=sse| -mfpmath=sse,387|-momit-leaf-frame-pointer" gcc-4.0.2: TESTINGFLAGS="-fforce-addr|-fmodulo-sched|-fgcse-sm|-fgcse-las|-fsched-spec-load| -fsched2-use-superblocks -fsched2-use-traces| -freschedule-modulo-scheduled-loops| -ftree-vectorize| -ftracer|-funroll-loops|-fvariable-expansion-in-unroller| -fprefetch-loop-arrays|-freorder-blocks-and-partition|-fweb|-ffast-math|-fpeel-loops| -fmove-loop-invariants|-fbranch-target-load-optimize|-fbranch-target-load-optimize2| -fbtr-bb-exclusive|-mfpmath=sse|-mfpmath=sse,387|-momit-leaf-frame-pointer|-D__NO_MATH_INLINES" -ftree-loop-linear is removed from the testingflags in gcc-4.0.2 because it leads to an endless loop in neural net in nbench.