http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
--- Comment #1 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-11-22 22:33:43 UTC --- > return (a != b || a != c) ? b : c; test_func_0_NG and test_func_1_NG cases are related with the target implementation of cstoresi4. The middle end expands a complex conditional jump to cstores and a simple conditional jumps. For expression a != b, SH's cstoresi4 implementation uses sh.c:sh_emit_compare_and_set which generates cmp/eq and movnegt insn, because we have no cmp/ne insn. Then we've got the sequence mov #-1,rn negc rn,rm tst #255,rm which is essentially T_reg = T_reg. Usually combine catches such situation, but negc might be too complex for combine. For this case, replacing current movnegt expander by insn, splitter and peephole something like (define_insn "movnegt" [(set (match_operand:SI 0 "arith_reg_dest" "=r") (plus:SI (reg:SI T_REG) (const_int -1))) (clobber (match_scratch:SI 1 "=&r")) (clobber (reg:SI T_REG))] "" "#" [(set_attr "length" "4")]) (define_split [(set (match_operand:SI 0 "arith_reg_dest" "=r") (plus:SI (reg:SI T_REG) (const_int -1))) (clobber (match_scratch:SI 1 "=&r")) (clobber (reg:SI T_REG))] "reload_completed" [(set (match_dup 1) (const_int -1)) (parallel [(set (match_dup 0) (neg:SI (plus:SI (reg:SI T_REG) (match_dup 1)))) (set (reg:SI T_REG) (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1)) (const_int 0)))])] "") (define_peephole2 [(set (match_operand:SI 1 "" "") (const_int -1)) (parallel [(set (match_operand:SI 0 "" "") (neg:SI (plus:SI (reg:SI T_REG) (match_dup 1)))) (set (reg:SI T_REG) (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1)) (const_int 0)))]) (set (reg:SI T_REG) (eq:SI (match_operand:QI 3 "" "") (const_int 0)))] "REGNO (operands[3]) == REGNO (operands[0]) && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1])" [(const_int 0)] "") the above useless sequence could be removed, though we will miss the chance that the -1 can be CSE-ed when the cstore value is used. This will cause a bit worse code for the loop like int foo (int *a, int x, int n) { int i; int count; for (i = 0; i < n; i++) count += (*(a + i) != x); return count; } though it may be relatively rare. BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think. > return a >= 0 && b >= 0 ? c : d; x >= 0 is expanded to the sequence like ra = not x rb = -31 rc = ra >> (neg rb) T = (rc == 0) conditional jump and combine tries to simplify it. combine simplifies b >= 0 successfully into shll and bt but fails to simplify a >= 0. It seems that combine doesn't do constant propagation well and misses the constant -31. In this case, a peephole like (define_peephole2 [(set (match_operand:SI 0 "arith_reg_dest" "") (not:SI (match_operand:SI 1 "arith_reg_operand" ""))) (set (match_operand:SI 2 "arith_reg_dest" "") (const_int -31)) (set (match_operand:SI 3 "arith_reg_dest" "") (lshiftrt:SI (match_dup 0) (neg:SI (match_dup 2)))) (set (reg:SI T_REG) (eq:SI (match_operand:QI 4 "arith_reg_operand" "") (const_int 0))) (set (pc) (if_then_else (match_operator 5 "comparison_operator" [(reg:SI T_REG) (const_int 0)]) (label_ref (match_operand 6 "" "")) (pc)))] "REGNO (operands[3]) == REGNO (operands[4]) && peep2_reg_dead_p (4, operands[0]) && (peep2_reg_dead_p (4, operands[3]) || rtx_equal_p (operands[2], operands[3])) && peep2_regno_dead_p (5, T_REG)" [(set (match_dup 2) (const_int -31)) (set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0))) (set (pc) (if_then_else (match_op_dup 7 [(reg:SI T_REG) (const_int 0)]) (label_ref (match_dup 6)) (pc)))] " { operands[7] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[5])), GET_MODE (operands[5]), XEXP (operands[5], 0), XEXP (operands[5], 1)); }") will be a workaround. It isn't ideal, but better than nothing. > return a == b ? test_sub0 (a, b) : test_sub1 (a, b); > return a != b ? test_sub0 (a, b) : test_sub1 (a, b); This case is intresting. At -Os, two calls are converted into one computed goto. A bit surprisingly, the conversion is done as a side effect of combine-stack-adjustments pass. That pass calls cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0); and the cross jumping optimization merges two calls. With -Os -fno-delayed-branch, the OK case is compiled to test_func_3_OK: mov r4,r1 cmp/eq r5,r1 mov.l .L4,r0 bf .L3 mov r1,r5 mov.l .L5,r0 bra .L3 nop .L3: jmp @r0 nop and the NG case test_func_3_NG: mov r4,r1 cmp/eq r5,r1 bt .L2 mov.l .L4,r0 bra .L3 nop .L2: mov.l .L5,r0 mov r1,r5 .L3: jmp @r0 nop Yep, the former is lucky. I guess that the latter requires basic block reordering for the further simplification, though I've found a comment /* Don't reorder blocks when optimizing for size because extra jump insns may be created; also barrier may create extra padding. More correctly we should have a block reordering mode that tried to minimize the combined size of all the jumps. This would more or less automatically remove extra jumps, but would also try to use more short jumps instead of long jumps. */ if (!optimize_function_for_speed_p (cfun)) return false; in bb-reorder.c.