https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> --- Another example on SH: const char* test (const char* s0, int c, int* rout) { int r = 0; for (int i = 0; i < c; ++i) r += s0[i]; *rout = r; return s0; } compiled with -O2: _test: cmp/pl r5 bf .L4 mov #0,r1 mov r4,r2 .align 2 .L3: mov.b @r2+,r3 dt r5 bf/s .L3 add r3,r1 .L2: mov.l r1,@r6 rts mov r4,r0 .align 1 .L4: bra .L2 mov #0,r1 compiled with -O2 -fno-reorder-blocks: _test: cmp/pl r5 bf/s .L4 mov #0,r1 mov r4,r2 .align 2 .L3: mov.b @r2+,r3 dt r5 bf/s .L3 add r3,r1 bra .L7 mov.l r1,@r6 .align 1 .L4: mov.l r1,@r6 .L7: rts mov r4,r0 .. which is better, except for the redundant stores. Folding the two stores gives the minimal code: _test: cmp/pl r5 bf/s .L4 mov #0,r1 mov r4,r2 .align 2 .L3: mov.b @r2+,r3 dt r5 bf/s .L3 add r3,r1 .L4: mov.l r1,@r6 rts mov r4,r0