http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236
Bug #: 54236 Summary: [SH] Improve addc and subc insn utilization Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: olege...@gcc.gnu.org ReportedBy: olege...@gcc.gnu.org Target: sh*-*-* There are currently a couple of cases, where it would be better if addc or subc insns were used. For example: int test00 (int a, int b) { return a + b + 1; } gets compiled to: mov r4,r0 ! MT add r5,r0 ! EX rts add #1,r0 ! EX could be better as: mov r4,r0 ! MT sett r5,r0 ! MT (SH4) rts addc #1,r0 ! EX As a proof of concept, I've applied the following to handle the above case: Index: gcc/config/sh/sh.md =================================================================== --- gcc/config/sh/sh.md (revision 190326) +++ gcc/config/sh/sh.md (working copy) @@ -1465,7 +1465,7 @@ (define_insn "addc" [(set (match_operand:SI 0 "arith_reg_dest" "=r") - (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "0") + (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0") (match_operand:SI 2 "arith_reg_operand" "r")) (reg:SI T_REG))) (set (reg:SI T_REG) @@ -1516,6 +1516,24 @@ "add %2,%0" [(set_attr "type" "arith")]) +(define_insn_and_split "*addsi3_compact" + [(set (match_operand:SI 0 "arith_reg_dest" "") + (plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand" "") + (match_operand:SI 2 "arith_reg_operand" "")) + (const_int 1))) + (clobber (reg:SI T_REG))] + "TARGET_SH1" + "#" + "&& 1" + [(set (reg:SI T_REG) (const_int 1)) + (parallel [(set (match_dup 0) + (plus:SI (plus:SI (match_dup 1) + (match_dup 2)) + (reg:SI T_REG))) + (set (reg:SI T_REG) + (ltu:SI (plus:SI (match_dup 1) (match_dup 2)) + (match_dup 1)))])]) + ;; ------------------------------------------------------------------------- ;; Subtraction instructions ;; ------------------------------------------------------------------------- .. and observed some code from the CSiBE set for -O2 -m4-single -ml -mpretend-cmove. It doesn't affect code size that much (some incs/decs here and there), but more importantly it does this (libmpeg2/motion_comp.c): _MC_avg_o_16_c: --> mov.b @r5,r1 mov.b @r5,r2 .L16: .L16: mov.b @r4,r2 sett extu.b r1,r1 mov.b @r4,r1 extu.b r2,r2 extu.b r2,r2 add r2,r1 extu.b r1,r1 add #1,r1 addc r2,r1 shar r1 shar r1 mov.b r1,@r4 mov.b r1,@r4 mov.b @(1,r5),r0 sett extu.b r0,r1 mov.b @(1,r5),r0 mov.b @(1,r4),r0 extu.b r0,r1 extu.b r0,r0 mov.b @(1,r4),r0 add r0,r1 extu.b r0,r0 add #1,r1 addc r1,r0 shar r1 shar r0 mov r1,r0 mov.b r0,@(1,r4) mov.b r0,@(1,r4) In such cases, the sett,addc sequence can be scheduled much better and in most cases the sett insn can be executed in parallel with some other insn. Unfortunately, on SH4A the sett insn has been moved from MT group to EX group, but still it seems beneficial. I've also seen a couple of places, where sett-subc sequences would be better.