On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi! > > Comparing to the previous versions of implementation mainly based on the > existing IV cands but zeroing the related group/use cost, this new one is > based > on Richard and Segher's suggestion introducing one doloop dedicated IV cand. > > Some key points are listed below: > 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV > cand. > 2) Special name "doloop" assigned. > 3) Doloop IV cand with form (niter+1, +, -1) > 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for > step. > 5) Support may_be_zero (regressed PR is in this case), the base of doloop IV > can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv. > 6) Add more expr support in force_expr_to_var_cost for reasonable cost > calculation on the IV base with may_be_zero (like COND_EXPR). > 7) Set zero cost when using doloop IV cand for doloop use. > 8) Add three hooks (should we merge _generic and _address?). > *) have_count_reg_decr_p, is to indicate the target has special hardware > count register, we shouldn't consider the impact of doloop IV when > calculating register pressures. > *) doloop_cost_for_generic, is the extra cost when using doloop IV cand > for > generic type IV use. > *) doloop_cost_for_address, is the extra cost when using doloop IV cand > for > address type IV use. What will happen if doloop IV cand be used for generic/address type iv use? Can RTL doloop can still perform doloop optimization in this case?
> > Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting > for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is > tracked > by PR89983. > > Any comments and suggestions are highly appreciated. Thanks! Not sure if I understand the patch correctly, some comments embedded. + /* The number of doloop candidate in the set. */ + unsigned n_doloop_cands; + This is unnecessary. See below comments. - add_candidate_1 (data, base, step, important, - IP_NORMAL, use, NULL, orig_iv); + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop, + orig_iv); if (ip_end_pos (data->current_loop) && allow_ip_end_pos_p (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv); + add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop, + orig_iv); Do we need to skip ip_end_pos case for doloop candidate? Because the candidate increment will be inserted in latch, i.e, increment position is after exit condition. - tree_to_aff_combination (iv->base, type, val); + tree base = iv->base; + /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to extract + the value under !may_be_zero to get the compact bound which also well fits + for may_be_zero since we ensure the value for it is const one. */ + if (cand->doloop_p && desc->may_be_zero && !integer_zerop (desc->may_be_zero)) + base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), + unshare_expr (rewrite_to_non_trapping_overflow (niter)), + build_int_cst (TREE_TYPE (niter), 1)); + tree_to_aff_combination (base, type, val); I don't quite follow here. The iv->base is computed from niter, I suppose compact bound is for cheaper candidate initialization? Why it's possible to extract !may_be_zero niter for may_be_zero here? The niter under !may_be_zero has no indication about the real niter under may_be_zero. - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); + cand_value_at (loop, cand, use->stmt, desc, &bnd); If I understand correctly, doloop use/cand will only be identified/added for single exit loop, and there will be only one cond(doloop) iv_use and only one doloop cand for doloop loop. So the cand_value at niter at use position would be 0. If that's the case, we can skip calling cand_value_at here for doloop cand. The change to cand_value_at would be unnecessary neither. - expensive. */ - if (!integer_zerop (desc->may_be_zero)) + expensive. + + For doloop candidate, we have considered MAY_BE_ZERO for IV base, need to + support MAY_BE_ZERO ? 0 : NITER, so simply bypass this check. */ + if (!integer_zerop (desc->may_be_zero) && !cand->doloop_p) return iv_elimination_compare_lt (data, cand, comp, desc); And we can early return before this? + if (may_be_zero) + { + if (COMPARISON_CLASS_P (may_be_zero)) + { + niter = fold_build3 (COND_EXPR, ntype, may_be_zero, + build_int_cst (ntype, 0), + rewrite_to_non_trapping_overflow (niter)); + } + /* Don't try to obtain the iteration count expression when may_be_zero is + integer_nonzerop (actually iteration count is one) or else. */ + else + return; + } + + tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), + build_int_cst (ntype, 1)); niter is the number of latch executions, so niter + 1 could wrap here, but guess it's not a problem the similar issue is not handled in vectorizer neither. + unsigned n_old = data->regs_used, n_spr_for_doloop = 0; + /* If target supports count register for doloop, it doesn't take GPR. */ + if (targetm.have_count_reg_decr_p) + n_spr_for_doloop = n_doloop_cands; + unsigned n_new = n_invs + n_cands - n_spr_for_doloop; Not necessary. See below. - cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands); + cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands, + ivs->n_doloop_cands); Also. ivs->n_cands--; + if (cp->cand->doloop_p) + ivs->n_doloop_cands--; ivs->n_cands++; + if (cp->cand->doloop_p) + ivs->n_doloop_cands++; You can just book n_cands under condition !cp->cand->doloop_p. + if (flag_branch_on_count_reg && generic_predict_doloop_p (data)) + { + if (find_doloop_use (data)) + { + data->doloop_use_p = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "Predict loop %d can perform" + " doloop optimization later.\n", + loop->num); + flow_loop_dump (loop, dump_file, NULL, 1); + } + } + } + Please factor this into a function to keep caller short. Thanks, bin > > Kewen > > --------- > > gcc/ChangeLog > > 2019-08-14 Kewen Lin <li...@gcc.gnu.org> > > PR middle-end/80791 > * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * target.def (have_count_reg_decr_p): New hook. > (doloop_cost_for_generic): Likewise. > (doloop_cost_for_address): Likewise. > * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * doc/tm.texi: Regenerate. > * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite > cost > addend. > (record_group): Init doloop_p. > (add_candidate_1): Add optional argument doloop, change the handlings > accordingly. > (add_candidate): Likewise. > (add_iv_candidate_for_biv): Update the call to add_candidate. > (generic_predict_doloop_p): Update attribute. > (force_expr_to_var_cost): Add costing for expressions > COND_EXPR/LT_EXPR/ > LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ > UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ > MIN_EXPR. > (determine_group_iv_cost_generic): Update for doloop IV cand. > (determine_group_iv_cost_address): Likewise. > (determine_group_iv_cost_cond): Likewise. > (determine_iv_cost): Likewise. > (ivopts_estimate_reg_pressure): Likewise. > (cand_value_at): Update argument niter type to struct > tree_niter_desc*, > consider doloop IV cand and may_be_zero. > (may_eliminate_iv): Update the call to cand_value_at, consider doloop > IV cand and may_be_zero. > (add_iv_candidate_for_doloop): New function. > (find_iv_candidates): Call function add_iv_candidate_for_doloop. > (determine_set_costs): Update the call to > ivopts_estimate_reg_pressure. > (iv_ca_recount_cost): Likewise. > (iv_ca_new): Init n_doloop_cands. > (iv_ca_set_no_cp): Update n_doloop_cands. > (iv_ca_set_cp): Likewise. > (iv_ca_dump): Dump register cost. > (find_doloop_use): Likewise. > (tree_ssa_iv_optimize_loop): Call function generic_predict_doloop_p > and > find_doloop_use. > > gcc/testsuite/ChangeLog > > 2019-08-14 Kewen Lin <li...@gcc.gnu.org> > > PR middle-end/80791 > * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. > * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. > * gcc.dg/tree-ssa/pr32044.c: Likewise. >