On Fri, Jul 17, 2015 at 8:27 AM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > This patch is to fix PR66388. It's an old issue but recently became worse > after my scev overflow change. IVOPT now can only compute iv use with > candidate which has at least same type precision. See below code: > > if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype)) > { > /* We do not have a precision to express the values of use. */ > return infinite_cost; > } > > This is not always true. It's possible to compute with a candidate of > smaller precision if it has enough stepping periods to express the iv use. > Just as code in iv_elimination. Well, since now we have iv no_overflow > information, we can use that to prove it's safe. Actually I am thinking > about improving iv elimination with overflow information too. So this patch > relaxes the constraint to allow computation of uses with smaller precision > candidates. > > Benchmark data shows several cases in spec2k6 are obviously improved on > aarch64: > 400.perlbench 2.32% > 445.gobmk 0.86% > 456.hmmer 11.72% > 464.h264ref 1.93% > 473.astar 0.75% > 433.milc -1.49% > 436.cactusADM 6.61% > 444.namd -0.76% > > I looked into assembly code of 456.hmmer&436.cactusADM, and can confirm hot > loops are reduced. Also perf data could confirm the improvement in > 456.hmmer. > I looked into 433.milc and found most hot functions are not affected by this > patch. But I do observe two kinds of regressions described as below: > A) For some loops, auto-increment addressing mode is generated before this > patch, but "base + index<<scale" is generated after. I don't worry about > this too much because auto-increment support in IVO hasn't been enabled on > AArch64 yet. On the contrary, we should worry that auto-increment support is > too aggressive in IVO, resulting in auto-increment addressing mode generated > where it shouldn't. I suspect the regression we monitored before is caused > by such kind of reason. > B) This patch enables computation of 64 bits address iv use with 32 bits biv > candidate. So there will be a sign extension before the candidate can be > used in memory reference as an index. I already increased the cost by 2 for > such biv candidates but there still be some peculiar cases... Decreasing > cost in determine_iv_cost for biv candidates makes this worse. It does that > to make debugging simpler, nothing to do with performance. > > Bootstrap and test on x86_64. It fixes failure of pr49781-1.c. > Unfortunately, it introduces new failure of > g++.dg/debug/dwarf2/deallocator.C. I looked into the test and found with > this patch, the loop is transformed into a shape that can be later > eliminated(because it can be proved never loop back?). We can further > discuss if it's this patch's problem or the case should be tuned. > Also bootstrap and test on aarch64. > > So what's your opinion?
Looks sensible, but the deallocator.C fail looks odd. I presume that i + j is simplified in a way that either the first or the second iteration must exit the loop via the return and thus the scan for deallocator.C:34 fails? How does this happen - I can only see this happen if we unroll the loop and then run into VRP. So does IVOPTs now affect non-loop code as well? Ah, at the moment we use an IV that runs backward. Still curious if this isn't a wrong-code issue... Richard. > Thanks, > bin > > > 2015-07-16 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/66388 > * tree-ssa-loop-ivopts.c (dump_iv): Dump no_overflow info. > (add_candidate_1): New parameter. Use unsigned type when iv > overflows. Pass no_overflow to alloc_iv. > (add_autoinc_candidates, add_candidate): New parameter. > Pass no_overflow to add_candidate_1. > (add_candidate): Ditto. > (add_iv_candidate_for_biv, add_iv_candidate_for_use): Pass iv's > no_overflow info to add_candidate and add_candidate_1. > (get_computation_aff, get_computation_cost_at): Handle candidate > with smaller precision than iv use. > > gcc/testsuite/ChangeLog > 2015-07-16 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/66388 > * gcc.dg/tree-ssa/pr66388.c: New test. >