On Fri, Jul 17, 2015 at 8:27 AM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> This patch is to fix PR66388.  It's an old issue but recently became worse
> after my scev overflow change.  IVOPT now can only compute iv use with
> candidate which has at least same type precision.  See below code:
>
>   if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
>     {
>       /* We do not have a precision to express the values of use.  */
>       return infinite_cost;
>     }
>
> This is not always true.  It's possible to compute with a candidate of
> smaller precision if it has enough stepping periods to express the iv use.
> Just as code in iv_elimination.  Well, since now we have iv no_overflow
> information, we can use that to prove it's safe.  Actually I am thinking
> about improving iv elimination with overflow information too.  So this patch
> relaxes the constraint to allow computation of uses with smaller precision
> candidates.
>
> Benchmark data shows several cases in spec2k6 are obviously improved on
> aarch64:
> 400.perlbench                2.32%
> 445.gobmk                    0.86%
> 456.hmmer                    11.72%
> 464.h264ref                  1.93%
> 473.astar                    0.75%
> 433.milc                     -1.49%
> 436.cactusADM                6.61%
> 444.namd                     -0.76%
>
> I looked into assembly code of 456.hmmer&436.cactusADM, and can confirm hot
> loops are reduced.  Also perf data could confirm the improvement in
> 456.hmmer.
> I looked into 433.milc and found most hot functions are not affected by this
> patch.  But I do observe two kinds of regressions described as below:
> A)  For some loops, auto-increment addressing mode is generated before this
> patch, but "base + index<<scale" is generated after. I don't worry about
> this too much because auto-increment support in IVO hasn't been enabled on
> AArch64 yet. On the contrary, we should worry that auto-increment support is
> too aggressive in IVO, resulting in auto-increment addressing mode generated
> where it shouldn't. I suspect the regression we monitored before is caused
> by such kind of reason.
> B) This patch enables computation of 64 bits address iv use with 32 bits biv
> candidate.  So there will be a sign extension before the candidate can be
> used in memory reference as an index. I already increased the cost by 2 for
> such biv candidates but there still be some peculiar cases... Decreasing
> cost in determine_iv_cost for biv candidates makes this worse.  It does that
> to make debugging simpler, nothing to do with performance.
>
> Bootstrap and test on x86_64.  It fixes failure of pr49781-1.c.
> Unfortunately, it introduces new failure of
> g++.dg/debug/dwarf2/deallocator.C.  I looked into the test and found with
> this patch, the loop is transformed into a shape that can be later
> eliminated(because it can be proved never loop back?).  We can further
> discuss if it's this patch's problem or the case should be tuned.
> Also bootstrap and test on aarch64.
>
> So what's your opinion?

Looks sensible, but the deallocator.C fail looks odd.  I presume that
i + j is simplified in a way that either the first or the second iteration
must exit the loop via the return and thus the scan for deallocator.C:34
fails?  How does this happen - I can only see this happen if we unroll
the loop and then run into VRP.  So does IVOPTs now affect non-loop
code as well?  Ah, at the moment we use an IV that runs backward.

Still curious if this isn't a wrong-code issue...

Richard.

> Thanks,
> bin
>
>
> 2015-07-16  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/66388
>         * tree-ssa-loop-ivopts.c (dump_iv): Dump no_overflow info.
>         (add_candidate_1): New parameter.  Use unsigned type when iv
>         overflows.  Pass no_overflow to alloc_iv.
>         (add_autoinc_candidates, add_candidate): New parameter.
>         Pass no_overflow to add_candidate_1.
>         (add_candidate): Ditto.
>         (add_iv_candidate_for_biv, add_iv_candidate_for_use): Pass iv's
>         no_overflow info to add_candidate and add_candidate_1.
>         (get_computation_aff, get_computation_cost_at): Handle candidate
>         with smaller precision than iv use.
>
> gcc/testsuite/ChangeLog
> 2015-07-16  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/66388
>         * gcc.dg/tree-ssa/pr66388.c: New test.
>

Reply via email to