Hi Bin, on 2019/8/22 下午1:46, Bin.Cheng wrote: > On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> Hi Bin, >> >> Thanks for your time! >> >> on 2019/8/21 下午8:32, Bin.Cheng wrote: >>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin <li...@linux.ibm.com> wrote: >>>> >>>> Hi! >>>> >>>> Comparing to the previous versions of implementation mainly based on the >>>> existing IV cands but zeroing the related group/use cost, this new one is >>>> based >>>> on Richard and Segher's suggestion introducing one doloop dedicated IV >>>> cand. >>>> >>>> Some key points are listed below: >>>> 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV >>>> cand. >>>> 2) Special name "doloop" assigned. >>>> 3) Doloop IV cand with form (niter+1, +, -1) >>>> 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for >>>> step. >>>> 5) Support may_be_zero (regressed PR is in this case), the base of >>>> doloop IV >>>> can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv. >>>> 6) Add more expr support in force_expr_to_var_cost for reasonable cost >>>> calculation on the IV base with may_be_zero (like COND_EXPR). >>>> 7) Set zero cost when using doloop IV cand for doloop use. >>>> 8) Add three hooks (should we merge _generic and _address?). >>>> *) have_count_reg_decr_p, is to indicate the target has special >>>> hardware >>>> count register, we shouldn't consider the impact of doloop IV when >>>> calculating register pressures. >>>> *) doloop_cost_for_generic, is the extra cost when using doloop IV >>>> cand for >>>> generic type IV use. >>>> *) doloop_cost_for_address, is the extra cost when using doloop IV >>>> cand for >>>> address type IV use. >>> What will happen if doloop IV cand be used for generic/address type iv >>> use? Can RTL doloop can still perform doloop optimization in this >>> case? >>> >> >> On Power, we put the iteration count into hardware count register, it takes >> very >> high cost to move the count to GPR, so the cost is set as INF to make it >> impossible >> to use it for generic/address type iv use. But as some discussion before, >> on some >> targets using GPR instead of hardware count register, they probably want to >> use this >> doloop iv used for other uses if profitable. These two hooks offer the >> possibility. >> In that case, I think RTL doloop can still perform since it can still get the >> pattern and transform. The generic/address uses can still use it. >>>> >>>> Bootstrapped on powerpc64le-linux-gnu and regression testing passed >>>> excepting >>>> for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is >>>> tracked >>>> by PR89983. >>>> >>>> Any comments and suggestions are highly appreciated. Thanks! >>> Not sure if I understand the patch correctly, some comments embedded. >>> >>> + /* The number of doloop candidate in the set. */ >>> + unsigned n_doloop_cands; >>> + >>> This is unnecessary. See below comments. >>> >>> - add_candidate_1 (data, base, step, important, >>> - IP_NORMAL, use, NULL, orig_iv); >>> + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, >>> doloop, >>> + orig_iv); >>> if (ip_end_pos (data->current_loop) >>> && allow_ip_end_pos_p (data->current_loop)) >>> - add_candidate_1 (data, base, step, important, IP_END, use, NULL, >>> orig_iv); >>> + add_candidate_1 (data, base, step, important, IP_END, use, NULL, >>> doloop, >>> + orig_iv); >>> Do we need to skip ip_end_pos case for doloop candidate? Because the >>> candidate increment will be inserted in latch, i.e, increment position >>> is after exit condition. >>> >> >> Yes, we should skip it. Currently function find_doloop_use has the check on >> an >> empty latch and gimple_cond to latch, partially excluding it. But it's >> still good >> to guard it directly here. >> >>> - tree_to_aff_combination (iv->base, type, val); >>> + tree base = iv->base; >>> + /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to >>> extract >>> + the value under !may_be_zero to get the compact bound which also well >>> fits >>> + for may_be_zero since we ensure the value for it is const one. */ >>> + if (cand->doloop_p && desc->may_be_zero && !integer_zerop >>> (desc->may_be_zero)) >>> + base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), >>> + unshare_expr (rewrite_to_non_trapping_overflow >>> (niter)), >>> + build_int_cst (TREE_TYPE (niter), 1)); >>> + tree_to_aff_combination (base, type, val); >>> I don't quite follow here. The iv->base is computed from niter, I >>> suppose compact bound is for cheaper candidate initialization? Why >>> it's possible to extract !may_be_zero niter for may_be_zero here? The >>> niter under !may_be_zero has no indication about the real niter under >>> may_be_zero. >>> >> >> As you note below, the cand_value for doloop would be zero, but for the case >> may_be_zero set, the current calculation would take care of the whole niter >> expression including the cond_expr introduced by may_be_zero check, it's >> unexpected. The purpose is to use the value under condition !may_be_zero >> for the calculation, and yes, to get expected zero finally. >> >>> - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); >>> + cand_value_at (loop, cand, use->stmt, desc, &bnd); >>> If I understand correctly, doloop use/cand will only be >>> identified/added for single exit loop, and there will be only one >>> cond(doloop) iv_use and only one doloop cand for doloop loop. So the >>> cand_value at niter at use position would be 0. If that's the case, >>> we can skip calling cand_value_at here for doloop cand. The change to >>> cand_value_at would be unnecessary neither. >>> >> >> Exactly, I'll add the early return with zero bound for doloop. >> >>> - expensive. */ >>> - if (!integer_zerop (desc->may_be_zero)) >>> + expensive. >>> + >>> + For doloop candidate, we have considered MAY_BE_ZERO for IV base, >>> need to >>> + support MAY_BE_ZERO ? 0 : NITER, so simply bypass this check. */ >>> + if (!integer_zerop (desc->may_be_zero) && !cand->doloop_p) >>> return iv_elimination_compare_lt (data, cand, comp, desc); >>> And we can early return before this? >>> >> >> OK. >> >>> + if (may_be_zero) >>> + { >>> + if (COMPARISON_CLASS_P (may_be_zero)) >>> + { >>> + niter = fold_build3 (COND_EXPR, ntype, may_be_zero, >>> + build_int_cst (ntype, 0), >>> + rewrite_to_non_trapping_overflow (niter)); >>> + } >>> + /* Don't try to obtain the iteration count expression when >>> may_be_zero is >>> + integer_nonzerop (actually iteration count is one) or else. */ >>> + else >>> + return; >>> + } >>> + >>> + tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), >>> + build_int_cst (ntype, 1)); >>> niter is the number of latch executions, so niter + 1 could wrap here, >>> but guess it's not a problem the similar issue is not handled in >>> vectorizer neither. >>> >> >> OK. >> >>> + unsigned n_old = data->regs_used, n_spr_for_doloop = 0; >>> + /* If target supports count register for doloop, it doesn't take GPR. */ >>> + if (targetm.have_count_reg_decr_p) >>> + n_spr_for_doloop = n_doloop_cands; >>> + unsigned n_new = n_invs + n_cands - n_spr_for_doloop; >>> Not necessary. See below. >> >>> - cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands); >>> + cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands, >>> + ivs->n_doloop_cands); >>> Also. >>> >>> ivs->n_cands--; >>> + if (cp->cand->doloop_p) >>> + ivs->n_doloop_cands--; >>> >>> ivs->n_cands++; >>> + if (cp->cand->doloop_p) >>> + ivs->n_doloop_cands++; >>> You can just book n_cands under condition !cp->cand->doloop_p. >> >> If my understanding is correct, you are suggesting the code like: >> >> if (!cp->cand->doloop_p) >> ivs->n_cands++; >> >> But I'm afraid that it can NOT satisfy the need in function >> ivopts_estimate_reg_pressure. As the comments, "if target supports >> count register for doloop it doesn't take GPR.". If we make doloop >> cand invisible in n_cands, it's fine for target with count register, >> but we may miss to count them on targets without count register. > Why not one more step do checks: > if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p) > ivs->n_cands++; >
Yes, it works. Thanks! The new patch addressing the comments is attached. Could you please have a look again? Thanks in advance! Kewen --------- gcc/ChangeLog 2019-08-22 Kewen Lin <li...@gcc.gnu.org> PR middle-end/80791 * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * target.def (have_count_reg_decr_p): New hook. (doloop_cost_for_generic): Likewise. (doloop_cost_for_address): Likewise. * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * doc/tm.texi: Regenerate. * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost addend. (record_group): Init doloop_p. (add_candidate_1): Add optional argument doloop, change the handlings accordingly. (add_candidate): Likewise. (add_iv_candidate_for_biv): Update the call to add_candidate. (generic_predict_doloop_p): Update attribute. (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/ LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ MIN_EXPR. (determine_group_iv_cost_generic): Update for doloop IV cand. (determine_group_iv_cost_address): Likewise. (determine_group_iv_cost_cond): Likewise. (determine_iv_cost): Likewise. (ivopts_estimate_reg_pressure): Likewise. (may_eliminate_iv): Likewise. (add_iv_candidate_for_doloop): New function. (find_iv_candidates): Call function add_iv_candidate_for_doloop. (iv_ca_set_no_cp): Update for doloop IV cand. (iv_ca_set_cp): Likewise. (iv_ca_dump): Dump register cost. (find_doloop_use): New function. (predict_and_process_doloop): Likewise. (tree_ssa_iv_optimize_loop): Call function predict_and_process_doloop. gcc/testsuite/ChangeLog 2019-08-22 Kewen Lin <li...@gcc.gnu.org> PR middle-end/80791 * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. * gcc.dg/tree-ssa/pr32044.c: Likewise.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6667cd0..5eccbdc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1912,6 +1912,16 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_HAVE_COUNT_REG_DECR_P +#define TARGET_HAVE_COUNT_REG_DECR_P true + +/* 1000000000 is infinite cost in IVOPTs. */ +#undef TARGET_DOLOOP_COST_FOR_GENERIC +#define TARGET_DOLOOP_COST_FOR_GENERIC 1000000000 + +#undef TARGET_DOLOOP_COST_FOR_ADDRESS +#define TARGET_DOLOOP_COST_FOR_ADDRESS 1000000000 + #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c2aa4d0..9f3a08a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11618,6 +11618,29 @@ loops, and will help ivopts to make some decisions. The default version of this hook returns false. @end deftypefn +@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P +Return true if the target supports hardware count register for decrement +and branch. This count register can't be used as general register since +moving to/from a general register from/to it is very expensive. +The default value is false. +@end deftypevr + +@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_GENERIC +IVOPTs introduces one doloop dedicated IV candidate, this hook offers + target owner a way to adjust cost when selecting doloop IV candidate for a + generic IV use. At calcuation, this value will be added on normal cost + already calculated by current implementation. +The default value is zero. +@end deftypevr + +@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_ADDRESS +IVOPTs introduces one doloop dedicated IV candidate, this hook offers + target owner a way to adjust cost when selecting doloop IV candidate for an + address IV use. At calcuation, this value will be added on normal cost + already calculated by current implementation. +The default value is zero. +@end deftypevr + @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top}) Return true if it is possible to use low-overhead loops (@code{doloop_end} and @code{doloop_begin}) for a particular loop. @var{iterations} gives the diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b4d57b8..4346773 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7946,6 +7946,12 @@ to by @var{ce_info}. @hook TARGET_PREDICT_DOLOOP_P +@hook TARGET_HAVE_COUNT_REG_DECR_P + +@hook TARGET_DOLOOP_COST_FOR_GENERIC + +@hook TARGET_DOLOOP_COST_FOR_ADDRESS + @hook TARGET_CAN_USE_DOLOOP_P @hook TARGET_INVALID_WITHIN_DOLOOP diff --git a/gcc/target.def b/gcc/target.def index 71b6972..69e2844 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4246,6 +4246,32 @@ The default version of this hook returns false.", bool, (struct loop *loop), default_predict_doloop_p) +DEFHOOKPOD +(have_count_reg_decr_p, + "Return true if the target supports hardware count register for decrement\n\ +and branch. This count register can't be used as general register since\n\ +moving to/from a general register from/to it is very expensive.\n\ +The default value is false.", + bool, false) + +DEFHOOKPOD +(doloop_cost_for_generic, + "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\ + target owner a way to adjust cost when selecting doloop IV candidate for a\n\ + generic IV use. At calcuation, this value will be added on normal cost\n\ + already calculated by current implementation.\n\ +The default value is zero.", + int64_t, 0) + +DEFHOOKPOD +(doloop_cost_for_address, + "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\ + target owner a way to adjust cost when selecting doloop IV candidate for an\n\ + address IV use. At calcuation, this value will be added on normal cost\n\ + already calculated by current implementation.\n\ +The default value is zero.", + int64_t, 0) + DEFHOOK (can_use_doloop_p, "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c index 214e6a7..ce4b1d0 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c @@ -10,4 +10,6 @@ int main (void) f2 (); } -/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" } } */ +/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" { target { ! powerpc*-*-* } } } } */ +/* More debug information emitted for doloop on powerpc. */ +/* { dg-final { scan-tree-dump-times "!= 0" 6 "ivopts" { target { powerpc*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c index 7d5859b..71d7f67 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c @@ -17,6 +17,7 @@ f1 (char *p, uintptr_t i, uintptr_t n) while (i < n); } -/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */ -/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts"} } */ -/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" } } */ +/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" { target { ! powerpc*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "PHI" 2 "ivopts" { target { powerpc*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts" { target { ! powerpc*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts" { target { ! powerpc*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c b/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c index 8a8977a..06c27b0 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr32044.c @@ -1,6 +1,10 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ +/* For powerpc, disable doloop IV cand generation in IVOPTs to avoid unexpected + division operation for its base setup. */ +/* { dg-additional-options "-fno-branch-count-reg" { target { powerpc*-*-* } } } */ + int foo (int n) { while (n >= 45) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 530ea4a..be3b0b5 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -275,6 +275,9 @@ comp_cost::operator+= (comp_cost cost) comp_cost comp_cost::operator+= (HOST_WIDE_INT c) { + if (c >= INFTY) + this->cost = INFTY; + if (infinite_cost_p ()) return *this; @@ -399,6 +402,8 @@ struct iv_group struct cost_pair *cost_map; /* The selected candidate for the group. */ struct iv_cand *selected; + /* To indicate this is a doloop use group. */ + bool doloop_p; /* Uses in the group. */ vec<struct iv_use *> vuses; }; @@ -439,6 +444,7 @@ struct iv_cand be hoisted out of loop. */ struct iv *orig_iv; /* The original iv if this cand is added from biv with smaller type. */ + bool doloop_p; /* Whether this is a doloop candidate. */ }; /* Hashtable entry for common candidate derived from iv uses. */ @@ -612,6 +618,9 @@ struct ivopts_data /* Whether the loop body can only be exited via single exit. */ bool loop_single_exit_p; + + /* Whether the loop has doloop comparison use. */ + bool doloop_use_p; }; /* An assignment of iv candidates to uses. */ @@ -1528,6 +1537,7 @@ record_group (struct ivopts_data *data, enum use_type type) group->type = type; group->related_cands = BITMAP_ALLOC (NULL); group->vuses.create (1); + group->doloop_p = false; data->vgroups.safe_push (group); return group; @@ -3017,9 +3027,9 @@ get_loop_invariant_expr (struct ivopts_data *data, tree inv_expr) replacement of the final value of the iv by a direct computation. */ static struct iv_cand * -add_candidate_1 (struct ivopts_data *data, - tree base, tree step, bool important, enum iv_position pos, - struct iv_use *use, gimple *incremented_at, +add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, + enum iv_position pos, struct iv_use *use, + gimple *incremented_at, bool doloop = false, struct iv *orig_iv = NULL) { unsigned i; @@ -3079,11 +3089,15 @@ add_candidate_1 (struct ivopts_data *data, cand->pos = pos; if (pos != IP_ORIGINAL) { - cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp"); + if (doloop) + cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "doloop"); + else + cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp"); cand->var_after = cand->var_before; } cand->important = important; cand->incremented_at = incremented_at; + cand->doloop_p = doloop; data->vcands.safe_push (cand); if (!poly_int_tree_p (step)) @@ -3116,6 +3130,7 @@ add_candidate_1 (struct ivopts_data *data, } cand->important |= important; + cand->doloop_p |= doloop; /* Relate candidate to the group for which it is added. */ if (use) @@ -3209,16 +3224,17 @@ add_autoinc_candidates (struct ivopts_data *data, tree base, tree step, the end of loop. */ static void -add_candidate (struct ivopts_data *data, - tree base, tree step, bool important, struct iv_use *use, +add_candidate (struct ivopts_data *data, tree base, tree step, bool important, + struct iv_use *use, bool doloop = false, struct iv *orig_iv = NULL) { if (ip_normal_pos (data->current_loop)) - add_candidate_1 (data, base, step, important, - IP_NORMAL, use, NULL, orig_iv); - if (ip_end_pos (data->current_loop) + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop, + orig_iv); + if (!doloop && ip_end_pos (data->current_loop) && allow_ip_end_pos_p (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv); + add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop, + orig_iv); } /* Adds standard iv candidates. */ @@ -3262,7 +3278,7 @@ add_iv_candidate_for_biv (struct ivopts_data *data, struct iv *iv) tree step = fold_convert (sizetype, iv->step); /* Add iv cand of same precision as index part in TARGET_MEM_REF. */ - add_candidate (data, base, step, true, NULL, iv); + add_candidate (data, base, step, true, NULL, false, iv); /* Add iv cand of the original type only if it has nonlinear use. */ if (iv->nonlin_use) add_candidate (data, iv->base, iv->step, true, NULL); @@ -3724,7 +3740,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data) Some RTL specific checks seems unable to be checked in gimple, if any new checks or easy checks _are_ missing here, please add them. */ -static bool ATTRIBUTE_UNUSED +static bool generic_predict_doloop_p (struct ivopts_data *data) { struct loop *loop = data->current_loop; @@ -4177,6 +4193,36 @@ force_expr_to_var_cost (tree expr, bool speed) STRIP_NOPS (op0); op1 = NULL_TREE; break; + /* See add_iv_candidate_for_doloop, for doloop may_be_zero case, we + introduce COND_EXPR for IV base, need to support better cost estimation + for this COND_EXPR and tcc_comparison. */ + case COND_EXPR: + op0 = TREE_OPERAND (expr, 1); + STRIP_NOPS (op0); + op1 = TREE_OPERAND (expr, 2); + STRIP_NOPS (op1); + break; + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + case UNORDERED_EXPR: + case ORDERED_EXPR: + case UNLT_EXPR: + case UNLE_EXPR: + case UNGT_EXPR: + case UNGE_EXPR: + case UNEQ_EXPR: + case LTGT_EXPR: + case MAX_EXPR: + case MIN_EXPR: + op0 = TREE_OPERAND (expr, 0); + STRIP_NOPS (op0); + op1 = TREE_OPERAND (expr, 1); + STRIP_NOPS (op1); + break; default: /* Just an arbitrary value, FIXME. */ @@ -4258,6 +4304,35 @@ force_expr_to_var_cost (tree expr, bool speed) case RSHIFT_EXPR: cost = comp_cost (add_cost (speed, mode), 0); break; + case COND_EXPR: + op0 = TREE_OPERAND (expr, 0); + STRIP_NOPS (op0); + if (op0 == NULL_TREE || TREE_CODE (op0) == SSA_NAME + || CONSTANT_CLASS_P (op0)) + cost = no_cost; + else + cost = force_expr_to_var_cost (op0, speed); + break; + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + case UNORDERED_EXPR: + case ORDERED_EXPR: + case UNLT_EXPR: + case UNLE_EXPR: + case UNGT_EXPR: + case UNGE_EXPR: + case UNEQ_EXPR: + case LTGT_EXPR: + case MAX_EXPR: + case MIN_EXPR: + /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate + cost evaluation way. */ + cost = comp_cost (1.5 * add_cost (speed, mode), 0); + break; default: gcc_unreachable (); @@ -4706,8 +4781,12 @@ determine_group_iv_cost_generic (struct ivopts_data *data, if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt) cost = no_cost; else - cost = get_computation_cost (data, use, cand, false, - &inv_vars, NULL, &inv_expr); + { + cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL, + &inv_expr); + if (cand->doloop_p) + cost += targetm.doloop_cost_for_generic; + } if (inv_expr) { @@ -4735,6 +4814,9 @@ determine_group_iv_cost_address (struct ivopts_data *data, cost = get_computation_cost (data, use, cand, true, &inv_vars, &can_autoinc, &inv_expr); + if (cand->doloop_p) + cost += targetm.doloop_cost_for_address; + if (inv_expr) { inv_exprs = BITMAP_ALLOC (NULL); @@ -5142,6 +5224,15 @@ may_eliminate_iv (struct ivopts_data *data, } } + /* For doloop IV cand, the bound would be zero. It's safe whether + may_be_zero set or not. */ + if (cand->doloop_p) + { + *bound = build_int_cst (TREE_TYPE (cand->iv->base), 0); + *comp = iv_elimination_compare (data, use); + return true; + } + cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); *bound = fold_convert (TREE_TYPE (cand->iv->base), @@ -5264,6 +5355,9 @@ determine_group_iv_cost_cond (struct ivopts_data *data, inv_vars = inv_vars_elim; inv_vars_elim = NULL; inv_expr = inv_expr_elim; + /* For doloop candidate/use pair, adjust to zero cost. */ + if (group->doloop_p && cand->doloop_p) + cost = no_cost; } else { @@ -5390,6 +5484,42 @@ relate_compare_use_with_all_cands (struct ivopts_data *data) } } +/* Add one doloop dedicated IV candidate: + - Base is (may_be_zero ? 1 : (niter + 1)). + - Step is -1. */ + +static void +add_iv_candidate_for_doloop (struct ivopts_data *data) +{ + tree_niter_desc *niter_desc = niter_for_single_dom_exit (data); + gcc_assert (niter_desc && niter_desc->assumptions); + + tree niter = niter_desc->niter; + tree ntype = TREE_TYPE (niter); + gcc_assert (TREE_CODE (ntype) == INTEGER_TYPE); + + tree may_be_zero = niter_desc->may_be_zero; + if (may_be_zero && integer_zerop (may_be_zero)) + may_be_zero = NULL_TREE; + if (may_be_zero) + { + if (COMPARISON_CLASS_P (may_be_zero)) + { + niter = fold_build3 (COND_EXPR, ntype, may_be_zero, + build_int_cst (ntype, 0), + rewrite_to_non_trapping_overflow (niter)); + } + /* Don't try to obtain the iteration count expression when may_be_zero is + integer_nonzerop (actually iteration count is one) or else. */ + else + return; + } + + tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), + build_int_cst (ntype, 1)); + add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, true); +} + /* Finds the candidates for the induction variables. */ static void @@ -5398,6 +5528,10 @@ find_iv_candidates (struct ivopts_data *data) /* Add commonly used ivs. */ add_standard_iv_candidates (data); + /* Add doloop dedicate ivs. */ + if (data->doloop_use_p) + add_iv_candidate_for_doloop (data); + /* Add old induction variables. */ add_iv_candidate_for_bivs (data); @@ -5578,16 +5712,21 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand) or a const set. */ if (cost_base.cost == 0) cost_base.cost = COSTS_N_INSNS (1); - cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); - + /* Doloop decrement should be considered as zero cost. */ + if (cand->doloop_p) + cost_step = 0; + else + cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); cost = cost_step + adjust_setup_cost (data, cost_base.cost); /* Prefer the original ivs unless we may gain something by replacing it. The reason is to make debugging simpler; so this is not relevant for artificial ivs created by other optimization passes. */ - if (cand->pos != IP_ORIGINAL - || !SSA_NAME_VAR (cand->var_before) - || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) + if ((cand->pos != IP_ORIGINAL + || !SSA_NAME_VAR (cand->var_before) + || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) + /* Prefer doloop as well. */ + && !cand->doloop_p) cost++; /* Prefer not to insert statements into latch unless there are some @@ -5832,7 +5971,8 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs, if (ivs->n_cand_uses[cid] == 0) { bitmap_clear_bit (ivs->cands, cid); - ivs->n_cands--; + if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p) + ivs->n_cands--; ivs->cand_cost -= cp->cand->cost; iv_ca_set_remove_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses); iv_ca_set_remove_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses); @@ -5889,7 +6029,8 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs, if (ivs->n_cand_uses[cid] == 1) { bitmap_set_bit (ivs->cands, cid); - ivs->n_cands++; + if (!cp->cand->doloop_p || !targetm.have_count_reg_decr_p) + ivs->n_cands++; ivs->cand_cost += cp->cand->cost; iv_ca_set_add_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses); iv_ca_set_add_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses); @@ -6134,6 +6275,8 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs) fprintf (file, " cost: %" PRId64 " (complexity %d)\n", cost.cost, cost.complexity); + fprintf (file, " reg_cost: %d\n", + ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands)); fprintf (file, " cand_cost: %" PRId64 "\n cand_group_cost: " "%" PRId64 " (complexity %d)\n", ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity); @@ -7568,6 +7711,75 @@ determine_scaling_factor (struct ivopts_data *data, basic_block *body) } } +/* Find doloop comparison use and set its doloop_p on if found. */ + +static bool +find_doloop_use (struct ivopts_data *data) +{ + struct loop *loop = data->current_loop; + + for (unsigned i = 0; i < data->vgroups.length (); i++) + { + struct iv_group *group = data->vgroups[i]; + if (group->type == USE_COMPARE) + { + gcc_assert (group->vuses.length () == 1); + struct iv_use *use = group->vuses[0]; + gimple *stmt = use->stmt; + if (gimple_code (stmt) == GIMPLE_COND) + { + basic_block bb = gimple_bb (stmt); + edge true_edge, false_edge; + extract_true_false_edges_from_block (bb, &true_edge, &false_edge); + /* This comparison is used for loop latch. Require latch is empty + for now. */ + if ((loop->latch == true_edge->dest + || loop->latch == false_edge->dest) + && empty_block_p (loop->latch)) + { + group->doloop_p = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Doloop cmp iv use: "); + print_gimple_stmt (dump_file, stmt, TDF_DETAILS); + } + return true; + } + } + } + } + + return false; +} + +/* For the targets which support doloop, to predict whether later RTL doloop + transformation will perform on this loop, further detect the doloop use and + mark the flag doloop_use_p if predicted. */ + +void +predict_and_process_doloop (struct ivopts_data *data) +{ + if (!flag_branch_on_count_reg) + return; + + if (!generic_predict_doloop_p (data)) + return; + + if (find_doloop_use (data)) + { + data->doloop_use_p = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + struct loop *loop = data->current_loop; + fprintf (dump_file, + "Predict loop %d can perform" + " doloop optimization later.\n", + loop->num); + flow_loop_dump (loop, dump_file, NULL, 1); + } + } +} + /* Optimizes the LOOP. Returns true if anything changed. */ static bool @@ -7580,6 +7792,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, basic_block *body; gcc_assert (!data->niters); + data->doloop_use_p = false; data->current_loop = loop; data->loop_loc = find_loop_location (loop).get_location_t (); data->speed = optimize_loop_for_speed_p (loop); @@ -7622,6 +7835,9 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, /* Determine cost scaling factor for basic blocks in loop. */ determine_scaling_factor (data, body); + /* Predict doloop and find the doloop use if predicted. */ + predict_and_process_doloop (data); + /* Finds candidates for the induction variables (item 2). */ find_iv_candidates (data);