I've found serious performance regression between GCC version 3.4.6 and 4.2/4.3.
SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ================================================================ GCC: 3.4.6 4.2.1 4.3.0 (20070907) Composite: 6.05 5.01 4.82 FFT: 4.90 4.15 4.21 SOR: 10.10 8.36 7.64 MonteCarlo: 3.68 3.06 3.04 Sparse matmult: 5.45 4.45 4.03 LU: 6.10 5.03 5.18 ================================================================ BYTEmark* Native Mode Benchmark ver. 2 (10/95) ================================================================ GCC: 3.4.6 4.2.1 4.3.0 (20070907) NUMERIC SORT: 35.459 32.2 29.327 STRING SORT: 0.5943 0.57604 0.8603 BITFIELD: 1.0585e+07 9.269e+06 9.4138e+06 FP EMULATION: 4.4944 4.6012 5.364 FOURIER: 272.28 241.34 259.12 ASSIGNMENT: 0.35997 0.38373 0.39683 IDEA: 124.11 95.057 100.07 HUFFMAN: 45.593 52.083 56.391 NEURAL NET: 0.36153 0.30922 0.31348 LU DECOMPOSITION: 11.331 9.4938 8.255 ================================================================ The "real world application" has 20%-200% performance regression with GCC 4.x. All tests were compiled with this arguments: -O3 -ffast-math -fomit-frame-pointer -funroll-loops -ftracer -funit-at-a-time -m4 -ml This arguments were tuned for the best results under 3.4.6. I've played with various settings under 4.x, but can't achieve any performance improvement. I can rerun them with any key combination you want. This tests compilable under Linux can be downloaded from: - scimark: http://oktetlabs.ru/~snob/scimark.tgz - nbench: http://oktetlabs.ru/~snob/nbench.tgz I can attach this files to bugreport if this is acceptable and will not pollute bugzilla. Our target hardware has SH7750 processor running in little endian mode under RTEMS. Unfortunetaly there is no way to boot linux there. Can I ask you to run this tests under linux-sh? At least scimark one. After lurking inside backend sources, I found that m4 has several variants in GCC 4.x: m4-100, m4-200, etc. I've tried to compile this tests with m4-200 switch, but it looks like m4-200 enforces big-endian. Backend sources show, that there is a lot of work going on SH4 GCC part. I also wrote simple stupid tests to compare code generation between different compiler versions (I can mail/attach them to you, but they are really stupid) to understand what can cause such performance regression. But generated assembler is really different across versions. I can found only two obvious things: - GCC4 has a much more aggressive inline and loop unrolling. (-funroll-loops was dropped from compiler arguments with no positive result) - GCC4 has different command scheduling, which probably leads to performance regression. -- Summary: [SH4] performance regression between 3.4.6 and 4.x Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: nbkolchin at gmail dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: sh-unknown-rtemself http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431