GCC 4.1.1 (probably all 4.* versions, tested 4.3.0-svn also), uses cmp/eq instead of "dt" in loops. This leads to ~20% perfomance decrease.
Technically, loop processing algorithm is completely different between versions. Example (sources in attach): CFLAGS="-m4 -O3 -fomit-frame-pointer" gcc 3.4.4: ---------------------------- .LFB2: mov.l .L11,r3 mov #0,r0 mov.l .L12,r2 .L5: mov.l @r3+,r1 ! !!! dt r2 ! !!! bf/s .L5 add r1,r0 rts nop .L13: .align 2 .L11: .long -1946157056 .L12: .long 1000000 ----------------------------- gcc 4.1.1: ----------------------------- .LFB2: mov.l .L8,r2 mov #0,r0 mov.l .L9,r3 .L2: mov.l @r2+,r1 ! !!! cmp/eq r3,r2 ! !!! bf/s .L2 add r1,r0 rts nop .L10: .align 2 .L8: .long -1946157056 .L9: .long -1942157056 ----------------------------- P.S. We are porting application from gcc3.4 to gcc4.1 and have about 60% perfomance decrease. So this is probably just first report. :( -- Summary: [SH-4] Perfomance regression in loops. cmp/eq used instead of dt Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: nbkolchin at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: sh-rtemself http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953