Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

Bin.Cheng Sat, 20 Jul 2019 20:08:50 -0700

On Wed, Jun 19, 2019 at 7:47 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi all,
>
> This is the following patch after 
> https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00910.html
>
> Main steps:
>   1) Identify the doloop cmp type iv use and record its bind_cand (explain it 
> later).
>   2) Set zero cost for pairs between this use and any iv cand.
>   3) IV cand set selecting algorithm runs as usual.
>   4) Fix up the selected iv cand for doloop use if need.
>
> It only focuses on the targets like Power which has specific count register.
> target hook have_count_reg_decr_p is proposed for it.
>
> Some notes:
>
> *) Why we need zero cost?  How about just decrease the cost for the pair
>    between doloop use and its original iv cand?  How about just decrease
>    the cost for the pair between doloop use and one selected iv cand?
>
>    Since some target supports hardware count register for decrement and
>    branch, it doesn't need the general instruction sequence for decr, cmp and
>    branch in general registers.  The cost of moving count register to GPR
>    is generally high, so it's standalone and can't be shared with other iv
>    uses.  It means IVOPTs can take doloop use as invisible (zero cost).
>
>    Let's take a look at PR80791 for example.
>
>                             original biv (cand 4)  use derived iv (cand 6)
>      generic use:                   4                  0
>      comp use (doloop use):         0                 infinite
>
>     For iv cost, original biv has cost 4 while use derived iv has cost 5.
>     When IVOPTs considers doloop use, the optimal cost is 8 (original biv
>     iv cost 4 + use cost 4).  Unfortunately it's not actually optimal, since
>     later doloop transformation updates loop closing by count register,
>     original biv (and its update) won't be needed in loop closing any more.
>     The generic use become the only use for original biv.  That means, if we
>     know the doloop will perform later, we shouldn't consider the doloop use
>     when determining IV set.  If we don't consider it, the algorithm will
>     choose iv cand 6 with total cost 5 (iv cost 5 + use cost 0).
>
>     From the above, we can see that to decrease the cost for the pair between
>     doloop use and original biv doesn't work.  Meanwhile it's hard to predict
>     one good iv cand in final optimal set here and pre-update the cost
>     between it and doloop use.  The analysis would be heavy and imperfect.
>
> *) Why we need bind_cand?
>
>     As above, we assign zero cost for pairs between doloop use and each iv
>     cand.  It's possible that doloop use gets assigned one iv cand which is
>     invalid to be used during later rewrite.  Then we have to fix it up with 
> iv
>     cand originally used for it.  It's fine whatever this iv cand exists in
>     final iv cand set or not, even if it's not in the set, it will be
>     eliminated in doloop transformation.
>
> By the way, I was thinking whether we can replace the hook 
> have_count_reg_decr_p
> with flag_branch_on_count_reg.  As the description of the "no-" option, 
> "Disable
> the optimization pass that scans for opportunities to use 'decrement and 
> branch'
> instructions on a count register instead of instruction sequences that 
> decrement
> a register, compare it against zero, and then branch based upon the result.", 
> it
> implicitly says it has count register support.  But I noticed that the gate of
> doloop_optimize checks this flag, as what I got from the previous 
> discussions, some
> targets which can perform doloop_optimize don't have specific count register, 
> so it
> sounds we can't make use of the flag, is it correct?
>
> Bootstrapped on powerpcle, also ran regression testing on powerpcle, got one 
> failure
> which is exposed by this patch and the root cause is duplicate of PR62147.
> case is gcc.target/powerpc/20050830-1.c
>
> Is it OK for trunk?
Sorry for the delaying.


I am not in favor of the approach very much.  When rewriting the pass
last time, we tried to reuse as much code as possible between cost
computation and iv_use rewriting.  we also followed guideline when
finite cost computed for cand/use pair, the use should be rewritten
using the cand successfully.  However, the patch adjust infinite cost
to zero cost causing cand can't be used to rewrite iv_use selected,
this is a backward step IMHO.

I am not sure if this is only useful for doloop cases, or for general cases?

Comment mentioned the point is to give more chances to consider other
IV cands instead of BIV.  If current algorithm relies on zeroing cost
of impossible cand/use pair to select optimal result, I suspect it's a
bug which should be fixed in candidate selection algorithm.  Do you
have a test case showing the issue? We should fix it as a standalone
problem, while the approach is covering the problem and not that
sound.

However, I think the patch can be changed that only finite cost should
be adjusted to zero.  Thus guarantee any cand selected is valid to
rewrite iv_use.

Thanks,
bin
>
> --------------
>
> gcc/ChangeLog
>
> 2019-06-19  Kewen Lin  <li...@gcc.gnu.org>
>
>         PR middle-end/80791
>         * target.def (have_count_reg_decr_p): New hook.
>         * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): New hook.
>         * doc/tm.texi: Regenerate.
>         * config/rs6000/rs6000.c (rs6000_have_count_reg_decr_p): New function.
>         (TARGET_HAVE_COUNT_REG_DECR_P): New macro.
>         * tree-ssa-loop-ivopts.c (adjust_group_iv_cost_for_doloop): New 
> function.
>         (fixup_doloop_groups): Likewise.
>         (find_doloop_use_and_its_bind): Likewise.
>         (record_group): Init bind_cand.
>         (determine_group_iv_cost): Call adjust_group_iv_cost_for_doloop.
>         (find_optimal_iv_set): Call fixup_doloop_groups.
>         (tree_ssa_iv_optimize_loop): Call function have_count_reg_decr_p,
>         generic_predict_doloop_p and find_doloop_use_and_its_bind.
>         (generic_predict_doloop_p): Update attribute.
>
> gcc/testsuite/ChangeLog
>
> 2019-06-19  Kewen Lin  <li...@gcc.gnu.org>
>
>         PR middle-end/80791
>         * gcc.dg/tree-ssa/ivopts-lt.c: Adjust.
>
>

Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

Reply via email to