> But I can also hide the cfun->function_frequency trick in > DEFAULT_BRANCH_COST macro if it seems to help. (in longer term I hope > they will all go away as expansion needs to be aware of hotness info > anyway)
Well, it definitly helps. I originally hoped there will be fewer places querying BRANCH_COST without profile info. I am testing updated patch. Honza * optabs.c (expand_abs_nojump): Update BRANCH_COST call. * fold-cost.c (LOGICAL_OP_NON_SHORT_CIRCUIT, fold_truthop): Likewise. * dojump.c (do_jump): Likewise. * ifcvt.c (MAX_CONDITIONAL_EXECUTE): Likewise. (note-if_info): Add BRANCH_COST. (noce_try_store_flag_constants, noce_try_addcc, noce_try_store_flag_mask, noce_try_cmove_arith, noce_try_cmove_arith, noce_try_cmove_arith, noce_find_if_block, find_if_case_1, find_if_case_2): Use compuated branch cost. * expr.h (BRANCH_COST): Update default. (DEFAULT_BRANCH_COST): Define. * predict.c (predictable_edge_p): New function. * expmed.c (expand_smod_pow2, expand_sdiv_pow2, emit_store_flag): Update BRANCH_COST call. * basic-block.h (predictable_edge_p): Declare. * config/alpha/alpha.h (BRANCH_COST): Update. * config/frv/frv.h (BRANCH_COST): Update. * config/s390/s390.h (BRANCH_COST): Update. * config/spu/spu.h (BRANCH_COST): Update. * config/sparc/sparc.h (BRANCH_COST): Update. * config/m32r/m32r.h (BRANCH_COST): Update. * config/i386/i386.h (BRANCH_COST): Update. * config/i386/i386.c (ix86_expand_int_movcc): Update use of BRANCH_COST. * config/sh/sh.h (BRANCH_COST): Update. * config/pdp11/pdp11.h (BRANCH_COST): Update. * config/avr/avr.h (BRANCH_COST): Update. * config/crx/crx.h (BRANCH_COST): Update. * config/xtensa/xtensa.h (BRANCH_COST): Update. * config/stormy16/stormy16.h (BRANCH_COST): Update. * config/m68hc11/m68hc11.h (BRANCH_COST): Update. * config/iq2000/iq2000.h (BRANCH_COST): Update. * config/ia64/ia64.h (BRANCH_COST): Update. * config/rs6000/rs6000.h (BRANCH_COST): Update. * config/arc/arc.h (BRANCH_COST): Update. * config/score/score.h (BRANCH_COST): Update. * config/arm/arm.h (BRANCH_COST): Update. * config/pa/pa.h (BRANCH_COST): Update. * config/mips/mips.h (BRANCH_COST): Update. * config/vax/vax.h (BRANCH_COST): Update. * config/h8300/h8300.h (BRANCH_COST): Update. * params.def (PARAM_PREDICTABLE_BRANCH_OUTCOME): New. * doc/invoke.texi (predictable-branch-cost-outcome): Document. * doc/tm.texi (BRANCH_COST): Update. Index: doc/tm.texi =================================================================== *** doc/tm.texi (revision 132800) --- doc/tm.texi (working copy) *************** value to the result of that function. T *** 5828,5836 **** are the same as to this macro. @end defmac ! @defmac BRANCH_COST ! A C expression for the cost of a branch instruction. A value of 1 is ! the default; other values are interpreted relative to that. @end defmac Here are additional macros which do not specify precise relative costs, --- 5828,5841 ---- are the same as to this macro. @end defmac ! @defmac BRANCH_COST (@var{hot_p}, @var{predictable_p}) ! A C expression for the cost of a branch instruction. A value of 1 is the ! default; other values are interpreted relative to that. Parameter @var{hot_p} ! is true when the branch in question might be hot in the compiled program. When ! it is false, @code{BRANCH_COST} should be returning value optimal for code size ! rather then performance considerations. @var{predictable_p} is true for well ! predictable branches. On many architectures the @code{BRANCH_COST} can be ! reduced then. @end defmac Here are additional macros which do not specify precise relative costs, Index: doc/invoke.texi =================================================================== *** doc/invoke.texi (revision 132800) --- doc/invoke.texi (working copy) *************** to the hottest structure frequency in th *** 6807,6812 **** --- 6807,6816 ---- parameter, then structure reorganization is not applied to this structure. The default is 10. + @item predictable-branch-cost-outcome + When branch is predicted to be taken with probability lower than this threshold + (in percent), then it is considered well predictable. The default is 10. + @item max-crossjump-edges The maximum number of incoming edges to consider for crossjumping. The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in Index: optabs.c =================================================================== *** optabs.c (revision 132800) --- optabs.c (working copy) *************** expand_abs_nojump (enum machine_mode mod *** 3425,3431 **** value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)), where W is the width of MODE. */ ! if (GET_MODE_CLASS (mode) == MODE_INT && BRANCH_COST >= 2) { rtx extended = expand_shift (RSHIFT_EXPR, mode, op0, size_int (GET_MODE_BITSIZE (mode) - 1), --- 3425,3432 ---- value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)), where W is the width of MODE. */ ! if (GET_MODE_CLASS (mode) == MODE_INT ! && DEFAULT_BRANCH_COST) { rtx extended = expand_shift (RSHIFT_EXPR, mode, op0, size_int (GET_MODE_BITSIZE (mode) - 1), Index: fold-const.c =================================================================== *** fold-const.c (revision 132800) --- fold-const.c (working copy) *************** fold_cond_expr_with_comparison (tree typ *** 5317,5323 **** #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT ! #define LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST >= 2) #endif /* EXP is some logical combination of boolean tests. See if we can --- 5317,5323 ---- #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT ! #define LOGICAL_OP_NON_SHORT_CIRCUIT (DEFAULT_BRANCH_COST >= 2) #endif /* EXP is some logical combination of boolean tests. See if we can *************** fold_truthop (enum tree_code code, tree *** 5565,5571 **** that can be merged. Avoid doing this if the RHS is a floating-point comparison since those can trap. */ ! if (BRANCH_COST >= 2 && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg)) && simple_operand_p (rl_arg) && simple_operand_p (rr_arg)) --- 5565,5571 ---- that can be merged. Avoid doing this if the RHS is a floating-point comparison since those can trap. */ ! if (DEFAULT_BRANCH_COST >= 2 && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg)) && simple_operand_p (rl_arg) && simple_operand_p (rr_arg)) Index: dojump.c =================================================================== *** dojump.c (revision 132800) --- dojump.c (working copy) *************** do_jump (tree exp, rtx if_false_label, r *** 515,521 **** /* High branch cost, expand as the bitwise AND of the conditions. Do the same if the RHS has side effects, because we're effectively turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */ ! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) goto normal; if (if_false_label == NULL_RTX) --- 515,522 ---- /* High branch cost, expand as the bitwise AND of the conditions. Do the same if the RHS has side effects, because we're effectively turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */ ! if (DEFAULT_BRANCH_COST >= 4 ! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) goto normal; if (if_false_label == NULL_RTX) *************** do_jump (tree exp, rtx if_false_label, r *** 535,541 **** /* High branch cost, expand as the bitwise OR of the conditions. Do the same if the RHS has side effects, because we're effectively turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ ! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) goto normal; if (if_true_label == NULL_RTX) --- 536,543 ---- /* High branch cost, expand as the bitwise OR of the conditions. Do the same if the RHS has side effects, because we're effectively turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ ! if (DEFAULT_BRANCH_COST >= 4 ! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) goto normal; if (if_true_label == NULL_RTX) Index: ipa-inline.c =================================================================== *** ipa-inline.c (revision 132800) --- ipa-inline.c (working copy) *************** cgraph_decide_inlining_of_small_function *** 925,931 **** not_good = N_("function not declared inline and code size would grow"); if (optimize_size) not_good = N_("optimizing for size and code size would grow"); ! if (not_good && growth > 0) { if (!cgraph_recursive_inlining_p (edge->caller, edge->callee, &edge->inline_failed)) --- 925,931 ---- not_good = N_("function not declared inline and code size would grow"); if (optimize_size) not_good = N_("optimizing for size and code size would grow"); ! if (not_good && growth > 0 && cgraph_estimate_growth (edge->callee)) { if (!cgraph_recursive_inlining_p (edge->caller, edge->callee, &edge->inline_failed)) Index: ifcvt.c =================================================================== *** ifcvt.c (revision 132800) --- ifcvt.c (working copy) *************** *** 67,73 **** #endif #ifndef MAX_CONDITIONAL_EXECUTE ! #define MAX_CONDITIONAL_EXECUTE (BRANCH_COST + 1) #endif #define IFCVT_MULTIPLE_DUMPS 1 --- 67,73 ---- #endif #ifndef MAX_CONDITIONAL_EXECUTE ! #define MAX_CONDITIONAL_EXECUTE (DEFAULT_BRANCH_COST + 1) #endif #define IFCVT_MULTIPLE_DUMPS 1 *************** struct noce_if_info *** 626,631 **** --- 626,634 ---- from TEST_BB. For the noce transformations, we allow the symmetric form as well. */ bool then_else_reversed; + + /* Estimated cost of the particular branch instruction. */ + int branch_cost; }; static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); *************** noce_try_store_flag_constants (struct no *** 963,982 **** normalize = 0; else if (ifalse == 0 && exact_log2 (itrue) >= 0 && (STORE_FLAG_VALUE == 1 ! || BRANCH_COST >= 2)) normalize = 1; else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse ! && (STORE_FLAG_VALUE == 1 || BRANCH_COST >= 2)) normalize = 1, reversep = 1; else if (itrue == -1 && (STORE_FLAG_VALUE == -1 ! || BRANCH_COST >= 2)) normalize = -1; else if (ifalse == -1 && can_reverse ! && (STORE_FLAG_VALUE == -1 || BRANCH_COST >= 2)) normalize = -1, reversep = 1; ! else if ((BRANCH_COST >= 2 && STORE_FLAG_VALUE == -1) ! || BRANCH_COST >= 3) normalize = -1; else return FALSE; --- 966,985 ---- normalize = 0; else if (ifalse == 0 && exact_log2 (itrue) >= 0 && (STORE_FLAG_VALUE == 1 ! || if_info->branch_cost >= 2)) normalize = 1; else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse ! && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2)) normalize = 1, reversep = 1; else if (itrue == -1 && (STORE_FLAG_VALUE == -1 ! || if_info->branch_cost >= 2)) normalize = -1; else if (ifalse == -1 && can_reverse ! && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2)) normalize = -1, reversep = 1; ! else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1) ! || if_info->branch_cost >= 3) normalize = -1; else return FALSE; *************** noce_try_addcc (struct noce_if_info *if_ *** 1107,1113 **** /* If that fails, construct conditional increment or decrement using setcc. */ ! if (BRANCH_COST >= 2 && (XEXP (if_info->a, 1) == const1_rtx || XEXP (if_info->a, 1) == constm1_rtx)) { --- 1110,1116 ---- /* If that fails, construct conditional increment or decrement using setcc. */ ! if (if_info->branch_cost >= 2 && (XEXP (if_info->a, 1) == const1_rtx || XEXP (if_info->a, 1) == constm1_rtx)) { *************** noce_try_store_flag_mask (struct noce_if *** 1158,1164 **** int reversep; reversep = 0; ! if ((BRANCH_COST >= 2 || STORE_FLAG_VALUE == -1) && ((if_info->a == const0_rtx && rtx_equal_p (if_info->b, if_info->x)) --- 1161,1167 ---- int reversep; reversep = 0; ! if ((if_info->branch_cost >= 2 || STORE_FLAG_VALUE == -1) && ((if_info->a == const0_rtx && rtx_equal_p (if_info->b, if_info->x)) *************** noce_try_cmove_arith (struct noce_if_inf *** 1317,1323 **** /* ??? FIXME: Magic number 5. */ if (cse_not_expected && MEM_P (a) && MEM_P (b) ! && BRANCH_COST >= 5) { a = XEXP (a, 0); b = XEXP (b, 0); --- 1320,1326 ---- /* ??? FIXME: Magic number 5. */ if (cse_not_expected && MEM_P (a) && MEM_P (b) ! && if_info->branch_cost >= 5) { a = XEXP (a, 0); b = XEXP (b, 0); *************** noce_try_cmove_arith (struct noce_if_inf *** 1347,1353 **** if (insn_a) { insn_cost = insn_rtx_cost (PATTERN (insn_a)); ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST)) return FALSE; } else --- 1350,1356 ---- if (insn_a) { insn_cost = insn_rtx_cost (PATTERN (insn_a)); ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost)) return FALSE; } else *************** noce_try_cmove_arith (struct noce_if_inf *** 1356,1362 **** if (insn_b) { insn_cost += insn_rtx_cost (PATTERN (insn_b)); ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST)) return FALSE; } --- 1359,1365 ---- if (insn_b) { insn_cost += insn_rtx_cost (PATTERN (insn_b)); ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost)) return FALSE; } *************** noce_find_if_block (basic_block test_bb, *** 2803,2808 **** --- 2806,2813 ---- if_info.cond_earliest = cond_earliest; if_info.jump = jump; if_info.then_else_reversed = then_else_reversed; + if_info.branch_cost = BRANCH_COST (maybe_hot_bb_p (test_bb), + predictable_edge_p (then_edge)); /* Do the real work. */ *************** find_if_case_1 (basic_block test_bb, edg *** 3569,3575 **** test_bb->index, then_bb->index); /* THEN is small. */ ! if (! cheap_bb_rtx_cost_p (then_bb, COSTS_N_INSNS (BRANCH_COST))) return FALSE; /* Registers set are dead, or are predicable. */ --- 3574,3582 ---- test_bb->index, then_bb->index); /* THEN is small. */ ! if (! cheap_bb_rtx_cost_p (then_bb, ! COSTS_N_INSNS (BRANCH_COST (maybe_hot_bb_p (then_edge->src), ! predictable_edge_p (then_edge))))) return FALSE; /* Registers set are dead, or are predicable. */ *************** find_if_case_2 (basic_block test_bb, edg *** 3683,3689 **** test_bb->index, else_bb->index); /* ELSE is small. */ ! if (! cheap_bb_rtx_cost_p (else_bb, COSTS_N_INSNS (BRANCH_COST))) return FALSE; /* Registers set are dead, or are predicable. */ --- 3690,3698 ---- test_bb->index, else_bb->index); /* ELSE is small. */ ! if (! cheap_bb_rtx_cost_p (else_bb, ! COSTS_N_INSNS (BRANCH_COST (maybe_hot_bb_p (else_edge->src), ! predictable_edge_p (else_edge))))) return FALSE; /* Registers set are dead, or are predicable. */ Index: expr.h =================================================================== *** expr.h (revision 132800) --- expr.h (working copy) *************** along with GCC; see the file COPYING3. *** 36,44 **** /* The default branch cost is 1. */ #ifndef BRANCH_COST ! #define BRANCH_COST 1 #endif /* This is the 4th arg to `expand_expr'. EXPAND_STACK_PARM means we are possibly expanding a call param onto the stack. --- 36,52 ---- /* The default branch cost is 1. */ #ifndef BRANCH_COST ! #define BRANCH_COST(hot_p, predictable_p) 1 #endif + /* When profile information is not known, make conservative assumptions. Use + of this macro should be avoided in favour of BRANCH_COST. */ + #define DEFAULT_BRANCH_COST \ + BRANCH_COST (optimize_size \ + ? 0 \ + : !cfun || cfun->function_frequency > FUNCTION_FREQUENCY_NORMAL,\ + false) + /* This is the 4th arg to `expand_expr'. EXPAND_STACK_PARM means we are possibly expanding a call param onto the stack. Index: predict.c =================================================================== *** predict.c (revision 132800) --- predict.c (working copy) *************** gate_estimate_probability (void) *** 1915,1920 **** --- 1923,1944 ---- return flag_guess_branch_prob; } + /* Return true when edge E is likely to be well predictable by branch + predictor. */ + + bool + predictable_edge_p (edge e) + { + if (profile_status == PROFILE_ABSENT) + return false; + if ((e->probability + <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100) + || (REG_BR_PROB_BASE - e->probability + <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100)) + return true; + return false; + } + struct tree_opt_pass pass_profile = { "profile", /* name */ Index: expmed.c =================================================================== *** expmed.c (revision 132800) --- expmed.c (working copy) *************** expand_smod_pow2 (enum machine_mode mode *** 3560,3566 **** result = gen_reg_rtx (mode); /* Avoid conditional branches when they're expensive. */ ! if (BRANCH_COST >= 2 && !optimize_size) { rtx signmask = emit_store_flag (result, LT, op0, const0_rtx, --- 3560,3566 ---- result = gen_reg_rtx (mode); /* Avoid conditional branches when they're expensive. */ ! if (DEFAULT_BRANCH_COST >= 2 && !optimize_size) { rtx signmask = emit_store_flag (result, LT, op0, const0_rtx, *************** expand_sdiv_pow2 (enum machine_mode mode *** 3660,3666 **** logd = floor_log2 (d); shift = build_int_cst (NULL_TREE, logd); ! if (d == 2 && BRANCH_COST >= 1) { temp = gen_reg_rtx (mode); temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1); --- 3660,3667 ---- logd = floor_log2 (d); shift = build_int_cst (NULL_TREE, logd); ! if (d == 2 ! && DEFAULT_BRANCH_COST >= 1) { temp = gen_reg_rtx (mode); temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1); *************** expand_sdiv_pow2 (enum machine_mode mode *** 3670,3676 **** } #ifdef HAVE_conditional_move ! if (BRANCH_COST >= 2) { rtx temp2; --- 3671,3677 ---- } #ifdef HAVE_conditional_move ! if (DEFAULT_BRANCH_COST >= 2) { rtx temp2; *************** expand_sdiv_pow2 (enum machine_mode mode *** 3699,3705 **** } #endif ! if (BRANCH_COST >= 2) { int ushift = GET_MODE_BITSIZE (mode) - logd; --- 3700,3706 ---- } #endif ! if (DEFAULT_BRANCH_COST >= 2) { int ushift = GET_MODE_BITSIZE (mode) - logd; *************** emit_store_flag (rtx target, enum rtx_co *** 5413,5419 **** comparison with zero. Don't do any of these cases if branches are very cheap. */ ! if (BRANCH_COST > 0 && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE) && op1 != const0_rtx) { --- 5414,5420 ---- comparison with zero. Don't do any of these cases if branches are very cheap. */ ! if (DEFAULT_BRANCH_COST > 0 && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE) && op1 != const0_rtx) { *************** emit_store_flag (rtx target, enum rtx_co *** 5436,5445 **** do LE and GT if branches are expensive since they are expensive on 2-operand machines. */ ! if (BRANCH_COST == 0 || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx || (code != EQ && code != NE ! && (BRANCH_COST <= 1 || (code != LE && code != GT)))) return 0; /* See what we need to return. We can only return a 1, -1, or the --- 5437,5446 ---- do LE and GT if branches are expensive since they are expensive on 2-operand machines. */ ! if (DEFAULT_BRANCH_COST == 0 || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx || (code != EQ && code != NE ! && (DEFAULT_BRANCH_COST <= 1 || (code != LE && code != GT)))) return 0; /* See what we need to return. We can only return a 1, -1, or the *************** emit_store_flag (rtx target, enum rtx_co *** 5535,5541 **** that "or", which is an extra insn, so we only handle EQ if branches are expensive. */ ! if (tem == 0 && (code == NE || BRANCH_COST > 1)) { if (rtx_equal_p (subtarget, op0)) subtarget = 0; --- 5536,5544 ---- that "or", which is an extra insn, so we only handle EQ if branches are expensive. */ ! if (tem == 0 ! && (code == NE ! || DEFAULT_BRANCH_COST > 1)) { if (rtx_equal_p (subtarget, op0)) subtarget = 0; Index: basic-block.h =================================================================== *** basic-block.h (revision 132800) --- basic-block.h (working copy) *************** extern void guess_outgoing_edge_probabil *** 839,844 **** --- 839,845 ---- extern void remove_predictions_associated_with_edge (edge); extern bool edge_probability_reliable_p (const_edge); extern bool br_prob_note_reliable_p (const_rtx); + extern bool predictable_edge_p (edge); /* In cfg.c */ extern void dump_regset (regset, FILE *); Index: config/alpha/alpha.h =================================================================== *** config/alpha/alpha.h (revision 132800) --- config/alpha/alpha.h (working copy) *************** extern int alpha_memory_latency; *** 631,637 **** #define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency) /* Provide the cost of a branch. Exact meaning under development. */ ! #define BRANCH_COST 5 /* Stack layout; function entry, exit and calling. */ --- 631,637 ---- #define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency) /* Provide the cost of a branch. Exact meaning under development. */ ! #define BRANCH_COST(hot_p, predictable_p) 5 /* Stack layout; function entry, exit and calling. */ Index: config/frv/frv.h =================================================================== *** config/frv/frv.h (revision 132800) --- config/frv/frv.h (working copy) *************** do { \ *** 2193,2199 **** /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST frv_branch_cost_int /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no faster than accessing a --- 2193,2199 ---- /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST(hot_p, predictable_p) frv_branch_cost_int /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no faster than accessing a Index: config/s390/s390.h =================================================================== *** config/s390/s390.h (revision 132800) --- config/s390/s390.h (working copy) *************** extern struct rtx_def *s390_compare_op0, *** 780,786 **** /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST 1 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 1 --- 780,786 ---- /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST(hot_p, predictable_p) 1 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 1 Index: config/spu/spu.h =================================================================== *** config/spu/spu.h (revision 132800) --- config/spu/spu.h (working copy) *************** targetm.resolve_overloaded_builtin = spu *** 456,462 **** /* Costs */ ! #define BRANCH_COST spu_branch_cost #define SLOW_BYTE_ACCESS 0 --- 456,462 ---- /* Costs */ ! #define BRANCH_COST(hot_p, predictable_p) spu_branch_cost #define SLOW_BYTE_ACCESS 0 Index: config/sparc/sparc.h =================================================================== *** config/sparc/sparc.h (revision 132800) --- config/sparc/sparc.h (working copy) *************** do { *** 2180,2186 **** On Niagara-2, a not-taken branch costs 1 cycle whereas a taken branch costs 6 cycles. */ ! #define BRANCH_COST \ ((sparc_cpu == PROCESSOR_V9 \ || sparc_cpu == PROCESSOR_ULTRASPARC) \ ? 7 \ --- 2180,2186 ---- On Niagara-2, a not-taken branch costs 1 cycle whereas a taken branch costs 6 cycles. */ ! #define BRANCH_COST (hot_p, predictable_p) \ ((sparc_cpu == PROCESSOR_V9 \ || sparc_cpu == PROCESSOR_ULTRASPARC) \ ? 7 \ Index: config/m32r/m32r.h =================================================================== *** config/m32r/m32r.h (revision 132800) --- config/m32r/m32r.h (working copy) *************** L2: .word STATIC *** 1219,1225 **** /* A value of 2 here causes GCC to avoid using branches in comparisons like while (a < N && a). Branches aren't that expensive on the M32R so we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */ ! #define BRANCH_COST ((TARGET_BRANCH_COST) ? 2 : 1) /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no --- 1219,1225 ---- /* A value of 2 here causes GCC to avoid using branches in comparisons like while (a < N && a). Branches aren't that expensive on the M32R so we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */ ! #define BRANCH_COST(hot_p, predictable_p) ((TARGET_BRANCH_COST) ? 2 : 1) /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no Index: config/i386/i386.h =================================================================== *** config/i386/i386.h (revision 132800) --- config/i386/i386.h (working copy) *************** do { \ *** 2052,2058 **** /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST ix86_branch_cost /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no --- 2052,2059 ---- /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST(hot_p, predictable_p) \ ! (!(hot_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost) /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no Index: config/i386/i386.c =================================================================== *** config/i386/i386.c (revision 132800) --- config/i386/i386.c (working copy) *************** ix86_expand_int_movcc (rtx operands[]) *** 12819,12825 **** */ if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL)) ! && BRANCH_COST >= 2) { if (cf == 0) { --- 12819,12826 ---- */ if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL)) ! && BRANCH_COST (cfun->function_frequency >= FUNCTION_FREQUENCY_NORMAL, ! false) >= 2) { if (cf == 0) { *************** ix86_expand_int_movcc (rtx operands[]) *** 12904,12910 **** optab op; rtx var, orig_out, out, tmp; ! if (BRANCH_COST <= 2) return 0; /* FAIL */ /* If one of the two operands is an interesting constant, load a --- 12905,12912 ---- optab op; rtx var, orig_out, out, tmp; ! if (BRANCH_COST (cfun->function_frequency >= FUNCTION_FREQUENCY_NORMAL, ! false) <= 2) return 0; /* FAIL */ /* If one of the two operands is an interesting constant, load a Index: config/sh/sh.h =================================================================== *** config/sh/sh.h (revision 132800) --- config/sh/sh.h (working copy) *************** struct sh_args { *** 2822,2828 **** The SH1 does not have delay slots, hence we get a pipeline stall at every branch. The SH4 is superscalar, so the single delay slot is not sufficient to keep both pipelines filled. */ ! #define BRANCH_COST (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1) /* Assembler output control. */ --- 2822,2829 ---- The SH1 does not have delay slots, hence we get a pipeline stall at every branch. The SH4 is superscalar, so the single delay slot is not sufficient to keep both pipelines filled. */ ! #define BRANCH_COST(hot_p, predictable_p) \ ! (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1) /* Assembler output control. */ Index: config/pdp11/pdp11.h =================================================================== *** config/pdp11/pdp11.h (revision 132800) --- config/pdp11/pdp11.h (working copy) *************** JMP FUNCTION 0x0058 0x0000 <- FUNCTION *** 1059,1065 **** /* there is no point in avoiding branches on a pdp, since branches are really cheap - I just want to find out how much difference the BRANCH_COST macro makes in code */ ! #define BRANCH_COST (TARGET_BRANCH_CHEAP ? 0 : 1) #define COMPARE_FLAG_MODE HImode --- 1059,1065 ---- /* there is no point in avoiding branches on a pdp, since branches are really cheap - I just want to find out how much difference the BRANCH_COST macro makes in code */ ! #define BRANCH_COST(hot_p, predictable_p) (TARGET_BRANCH_CHEAP ? 0 : 1) #define COMPARE_FLAG_MODE HImode Index: config/avr/avr.h =================================================================== *** config/avr/avr.h (revision 132800) --- config/avr/avr.h (working copy) *************** do { \ *** 481,487 **** (MODE)==SImode ? 8 : \ (MODE)==SFmode ? 8 : 16) ! #define BRANCH_COST 0 #define SLOW_BYTE_ACCESS 0 --- 481,487 ---- (MODE)==SImode ? 8 : \ (MODE)==SFmode ? 8 : 16) ! #define BRANCH_COST(hot_p, predictable_p) 0 #define SLOW_BYTE_ACCESS 0 Index: config/crx/crx.h =================================================================== *** config/crx/crx.h (revision 132800) --- config/crx/crx.h (working copy) *************** struct cumulative_args *** 420,426 **** /* Moving to processor register flushes pipeline - thus asymmetric */ #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2) /* Assume best case (branch predicted) */ ! #define BRANCH_COST 2 #define SLOW_BYTE_ACCESS 1 --- 420,426 ---- /* Moving to processor register flushes pipeline - thus asymmetric */ #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2) /* Assume best case (branch predicted) */ ! #define BRANCH_COST(hot_p, predictable_p) 2 #define SLOW_BYTE_ACCESS 1 Index: config/xtensa/xtensa.h =================================================================== *** config/xtensa/xtensa.h (revision 132800) --- config/xtensa/xtensa.h (working copy) *************** typedef struct xtensa_args *** 898,904 **** #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4 ! #define BRANCH_COST 3 /* How to refer to registers in assembler output. This sequence is indexed by compiler's hard-register-number (see above). */ --- 898,904 ---- #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4 ! #define BRANCH_COST(hot_p, predictable_p) 3 /* How to refer to registers in assembler output. This sequence is indexed by compiler's hard-register-number (see above). */ Index: config/stormy16/stormy16.h =================================================================== *** config/stormy16/stormy16.h (revision 132800) --- config/stormy16/stormy16.h (working copy) *************** do { \ *** 582,588 **** #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I)) ! #define BRANCH_COST 5 #define SLOW_BYTE_ACCESS 0 --- 582,588 ---- #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I)) ! #define BRANCH_COST(hot_p, predictable_p) 5 #define SLOW_BYTE_ACCESS 0 Index: config/m68hc11/m68hc11.h =================================================================== *** config/m68hc11/m68hc11.h (revision 132800) --- config/m68hc11/m68hc11.h (working copy) *************** extern unsigned char m68hc11_reg_valid_f *** 1266,1272 **** Pretend branches are cheap because GCC generates sub-optimal code for the default value. */ ! #define BRANCH_COST 0 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 0 --- 1266,1272 ---- Pretend branches are cheap because GCC generates sub-optimal code for the default value. */ ! #define BRANCH_COST(hot_p, predictable_p) 0 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 0 Index: config/iq2000/iq2000.h =================================================================== *** config/iq2000/iq2000.h (revision 132800) --- config/iq2000/iq2000.h (working copy) *************** typedef struct iq2000_args *** 620,626 **** #define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \ (TO_P ? 2 : 16) ! #define BRANCH_COST 2 #define SLOW_BYTE_ACCESS 1 --- 620,626 ---- #define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \ (TO_P ? 2 : 16) ! #define BRANCH_COST(hot_p, predictable_p) 2 #define SLOW_BYTE_ACCESS 1 Index: config/ia64/ia64.h =================================================================== *** config/ia64/ia64.h (revision 132800) --- config/ia64/ia64.h (working copy) *************** do { \ *** 1371,1377 **** many additional insn groups we run into, vs how good the dynamic branch predictor is. */ ! #define BRANCH_COST 6 /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no faster than accessing a --- 1371,1377 ---- many additional insn groups we run into, vs how good the dynamic branch predictor is. */ ! #define BRANCH_COST(hot_p, predictable_p) 6 /* Define this macro as a C expression which is nonzero if accessing less than a word of memory (i.e. a `char' or a `short') is no faster than accessing a Index: config/rs6000/rs6000.h =================================================================== *** config/rs6000/rs6000.h (revision 132800) --- config/rs6000/rs6000.h (working copy) *************** extern enum rs6000_nop_insertion rs6000_ *** 950,956 **** Set this to 3 on the RS/6000 since that is roughly the average cost of an unscheduled conditional branch. */ ! #define BRANCH_COST 3 /* Override BRANCH_COST heuristic which empirically produces worse performance for removing short circuiting from the logical ops. */ --- 950,956 ---- Set this to 3 on the RS/6000 since that is roughly the average cost of an unscheduled conditional branch. */ ! #define BRANCH_COST(hot_p, predictable_p) 3 /* Override BRANCH_COST heuristic which empirically produces worse performance for removing short circuiting from the logical ops. */ Index: config/arc/arc.h =================================================================== *** config/arc/arc.h (revision 132800) --- config/arc/arc.h (working copy) *************** arc_select_cc_mode (OP, X, Y) *** 824,830 **** /* The cost of a branch insn. */ /* ??? What's the right value here? Branches are certainly more expensive than reg->reg moves. */ ! #define BRANCH_COST 2 /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no --- 824,830 ---- /* The cost of a branch insn. */ /* ??? What's the right value here? Branches are certainly more expensive than reg->reg moves. */ ! #define BRANCH_COST(hot_p, predictable_p) 2 /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no Index: config/score/score.h =================================================================== *** config/score/score.h (revision 132800) --- config/score/score.h (working copy) *************** typedef struct score_args *** 795,801 **** (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P))) /* Try to generate sequences that don't involve branches. */ ! #define BRANCH_COST 2 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 1 --- 795,801 ---- (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P))) /* Try to generate sequences that don't involve branches. */ ! #define BRANCH_COST(hot_p, predictable_p) 2 /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 1 Index: config/arm/arm.h =================================================================== *** config/arm/arm.h (revision 132800) --- config/arm/arm.h (working copy) *************** do { \ *** 2271,2277 **** /* Try to generate sequences that don't involve branches, we can then use conditional instructions */ ! #define BRANCH_COST \ (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0)) /* Position Independent Code. */ --- 2271,2277 ---- /* Try to generate sequences that don't involve branches, we can then use conditional instructions */ ! #define BRANCH_COST(hot_p, predictable_p) \ (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0)) /* Position Independent Code. */ Index: config/pa/pa.h =================================================================== *** config/pa/pa.h (revision 132800) --- config/pa/pa.h (working copy) *************** do { \ *** 1569,1575 **** : 2) /* Adjust the cost of branches. */ ! #define BRANCH_COST (pa_cpu == PROCESSOR_8000 ? 2 : 1) /* Handling the special cases is going to get too complicated for a macro, just call `pa_adjust_insn_length' to do the real work. */ --- 1569,1575 ---- : 2) /* Adjust the cost of branches. */ ! #define BRANCH_COST(hot_p, predictable_p) (pa_cpu == PROCESSOR_8000 ? 2 : 1) /* Handling the special cases is going to get too complicated for a macro, just call `pa_adjust_insn_length' to do the real work. */ Index: config/mips/mips.h =================================================================== *** config/mips/mips.h (revision 132800) --- config/mips/mips.h (working copy) *************** typedef struct mips_args { *** 2415,2421 **** /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST mips_branch_cost #define LOGICAL_OP_NON_SHORT_CIRCUIT 0 /* If defined, modifies the length assigned to instruction INSN as a --- 2415,2421 ---- /* A C expression for the cost of a branch instruction. A value of 1 is the default; other values are interpreted relative to that. */ ! #define BRANCH_COST(hot_p, predictable_p) mips_branch_cost #define LOGICAL_OP_NON_SHORT_CIRCUIT 0 /* If defined, modifies the length assigned to instruction INSN as a Index: config/vax/vax.h =================================================================== *** config/vax/vax.h (revision 132800) --- config/vax/vax.h (working copy) *************** enum reg_class { NO_REGS, ALL_REGS, LIM_ *** 652,658 **** Branches are extremely cheap on the VAX while the shift insns often used to replace branches can be expensive. */ ! #define BRANCH_COST 0 /* Tell final.c how to eliminate redundant test instructions. */ --- 652,658 ---- Branches are extremely cheap on the VAX while the shift insns often used to replace branches can be expensive. */ ! #define BRANCH_COST(hot_p, predictable_p) 0 /* Tell final.c how to eliminate redundant test instructions. */ Index: config/h8300/h8300.h =================================================================== *** config/h8300/h8300.h (revision 132800) --- config/h8300/h8300.h (working copy) *************** struct cum_arg *** 1004,1010 **** #define DELAY_SLOT_LENGTH(JUMP) \ (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2) ! #define BRANCH_COST 0 /* Tell final.c how to eliminate redundant test instructions. */ --- 1004,1010 ---- #define DELAY_SLOT_LENGTH(JUMP) \ (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2) ! #define BRANCH_COST(hot_p, predictable_p) 0 /* Tell final.c how to eliminate redundant test instructions. */ Index: params.def =================================================================== *** params.def (revision 132800) --- params.def (working copy) *************** DEFPARAM (PARAM_STRUCT_REORG_COLD_STRUCT *** 93,98 **** --- 93,105 ---- "The threshold ratio between current and hottest structure counts", 10, 0, 100) + /* When branch is predicted to be taken with probability lower than this + threshold (in percent), then it is considered well predictable. */ + DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, + "predictable-branch-outcome", + "Maximal esitmated outcome of branch considered predictable", + 2, 0, 50) + /* The single function inlining limit. This is the maximum size of a function counted in internal gcc instructions (not in real machine instructions) that is eligible for inlining