http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54236
--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> --- Created attachment 31141 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31141&action=edit widening unsigned addition The addc insn can also be used to implement widening 32 bit -> 64 bit unsigned addition, and to do some sort of unsigned overflow checks. For example: unsigned long long test_1 (unsigned int a, unsigned int b) { // 1x addc, 1x clrt, 1x movt, no tst return (unsigned long long)a + (unsigned long long)b; } minimal code: mov r5,r1 clrt addc r4,r1 rts movt r0 unsigned int test_2 (unsigned int a, unsigned int b) { // 1x addc, 1x clrt, no movt, no tst unsigned long long x = (unsigned long long)a + (unsigned long long)b; if (x >> 32) return 0xFFFFFFFF; else return (unsigned int)x; } minimal code (branchy, generic): clrt addc r4,r5 bt/s .L6 mov #-1,r0 mov r5,r0 .L6: rts nop minimal code (branchless, special case for constant 0xFFFFFFFF): clrt addc r4,r5 subc r1,r1 // T = 0: r1 = 0x00000000 // T = 1: r1 = 0xFFFFFFFF or r1,r5 rts mov r5,r0 The attached patch makes the compiler output the minimal code sequences, except for the branchless version of test_2. It would require folding basic blocks which combine won't do. The patch enables the "zero_extendsidi2" insn on non-SHMedia. Without that insn, the following is expanded: (insn 22 4 7 2 (set (reg:DI 167 [ D.1680 ]) (const_int 0 [0])) sh_tmp.cpp:22 -1 (nil)) (insn 7 22 8 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 4) (reg:SI 4 r4 [ a ])) sh_tmp.cpp:22 257 {movsi_ie} (expr_list:REG_DEAD (reg:SI 4 r4 [ a ]) (nil))) (insn 8 7 23 2 (set (subreg:SI (reg:DI 167 [ D.1680 ]) 0) (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie} (nil)) (insn 23 8 9 2 (set (reg:DI 168 [ D.1680 ]) (const_int 0 [0])) sh_tmp.cpp:22 -1 (nil)) (insn 9 23 10 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 4) (reg:SI 5 r5 [ b ])) sh_tmp.cpp:22 257 {movsi_ie} (expr_list:REG_DEAD (reg:SI 5 r5 [ b ]) (nil))) (insn 10 9 11 2 (set (subreg:SI (reg:DI 168 [ D.1680 ]) 0) (const_int 0 [0])) sh_tmp.cpp:22 257 {movsi_ie} (nil)) ... and combine doesn't seem to understand this and thus will never try out things such as (plus:DI (zero_extend:DI (match_operand:SI 1 "arith_reg_operand")) (zero_extend:DI (match_operand:SI 2 "arith_reg_operand")))) The patch has a downside, though. It makes 64 bit arithmetic slightly worse in some cases (according to CSiBE). As far as I can see it, there are mainly two issues: - loads of constant #0 are not hoisted/CSE'd - some 64 bit highword optimizations are not applied anymore, in particular comparisons of zero extended unsigned SImode and unsigned DImode. The constant #0 issue can be addressed by making the "zero_extendsidi2" pattern 'reserve' the constant #0 load, as it is done for some other insns: -(define_insn "zero_extendsidi2" +(define_expand "zero_extendsidi2" + [(set (match_operand:DI 0 "arith_reg_dest") + (zero_extend:DI (match_operand:SI 1 "extend_reg_operand")))] + "TARGET_SHMEDIA || TARGET_SH1" +{ + if (TARGET_SH1) + { + if (can_create_pseudo_p ()) + { + rtx zero_val = force_reg (SImode, gen_int_mode (0, SImode)); + emit_insn (gen_zero_extendsidi2_compact (operands[0], operands[1], + zero_val)); + DONE; + } + else + FAIL; + } +}) + +(define_insn "*zero_extendsidi2" [(set (match_operand:DI 0 "arith_reg_dest" "=r") (zero_extend:DI (match_operand:SI 1 "extend_reg_operand" "r")))] "TARGET_SHMEDIA" @@ -5905,6 +5943,20 @@ [(set_attr "type" "arith_media") (set_attr "highpart" "extend")]) +(define_insn_and_split "zero_extendsidi2_compact" + [(set (match_operand:DI 0 "arith_reg_dest") + (zero_extend:DI (match_operand:SI 1 "extend_reg_operand"))) + (use (match_operand:SI 2 "arith_reg_operand"))] + "TARGET_SH1" + "#" + "&& can_create_pseudo_p ()" + [(const_int 0)] +{ + emit_move_insn (gen_lowpart (SImode, operands[0]), operands[1]); + emit_move_insn (gen_highpart (SImode, operands[0]), operands[2]); + DONE; +}) + However, maybe it'd be better to have a more generic solution to the constant load/sharing problem after combine (see PR 51708 comment #3). As for the comparisons, additional patterns would be required, which could be part of PR 52898.