On Tue, Jun 2, 2015 at 11:37 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, May 26, 2015 at 5:04 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Sun, May 24, 2015 at 8:47 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Fri, May 22, 2015 at 7:45 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Wed, May 20, 2015 at 11:41 AM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> As we know, GCC is too conservative when checking overflow behavior in >>>>> SCEV >>>>> and loop related optimizers. Result is some variable can't be recognized >>>>> as >>>>> scalar evolution and thus optimizations are missed. To be specific, >>>>> optimizers like ivopts and vectorizer are affected. >>>>> This issue is more severe on 64 bit platforms, for example, PR62173 is >>>>> failed on aarch64; scev-3.c and scev-4.c were marked as XFAIL on lp64 >>>>> platforms. >>>>> >>>>> As the first part to improve overflow checking in GCC, this patch does >>>>> below >>>>> improvements: >>>>> 1) Ideally, chrec_convert should be responsible to convert scev like >>>>> "(type){base, step}" to scev like "{(type)base, (type)step} when the >>>>> result >>>>> scev doesn't overflow; chrec_convert_aggressive should do the conversion >>>>> if >>>>> the result scev could overflow/wrap. Unfortunately, current >>>>> implementation >>>>> may use chrec_convert_aggressive to return a scev that won't overflow. >>>>> This >>>>> is because of a) the static parameter "fold_conversions" for >>>>> instantiate_scev_convert can only tracks whether chrec_convert_aggressive >>>>> may be called, rather than if it does some overflow conversion or not; b) >>>>> the implementation of instantiate_scev_convert sometimes shortcuts the >>>>> call >>>>> to chrec_convert and misses conversion opportunities. This patch improves >>>>> this. >>>>> 2) iv->no_overflow computed in simple_iv is too conservative. With 1) >>>>> fixed, iv->no_overflow should reflects whether chrec_convert_aggressive >>>>> does >>>>> return an overflow scev. This patch improves this. >>>>> 3) chrec_convert should be able to prove the resulting scev won't >>>>> overflow >>>>> with loop niter information. This patch doesn't finish this, but it >>>>> factored a new interface out of scev_probably_wraps_p for future >>>>> improvement. And that will be the part II patch. >>>>> >>>>> With the improvements in SCEV, this patch also improves optimizer(IVOPT) >>>>> that uses scev information like below: >>>>> For array reference in the form of arr[IV], GCC tries to derive new >>>>> address iv {arr+iv.base, iv.step*elem_size} from IV. If IV overflow wrto >>>>> a >>>>> type that is narrower than address space, this derivation is not true >>>>> because &arr[IV] isn't a scev. Root cause why scev-*.c are failed now is >>>>> the overflow information of IV is too conservative. IVOPT has to be >>>>> conservative to reject &arr[IV] as a scev. With more accurate overflow >>>>> information, IVOPT can be improved too. So this patch fixes the mentioned >>>>> long standing issues. >>>>> >>>>> Bootstrap and test on x86_64, x86 and aarch64. >>>>> BTW, test gcc.target/i386/pr49781-1.c failed on x86_64, but I can >>>>> confirmed >>>>> it's not this patch's fault. >>>>> >>>>> So what's your opinion on this?. >>>> >>>> I maybe mixing things up but does >>>> >>>> +chrec_convert_aggressive (tree type, tree chrec, bool *fold_conversions) >>>> { >>>> ... >>>> + if (evolution_function_is_affine_p (chrec)) >>>> + { >>>> + tree base, step; >>>> + struct loop *loop; >>>> + >>>> + loop = get_chrec_loop (chrec); >>>> + base = CHREC_LEFT (chrec); >>>> + step = CHREC_RIGHT (chrec); >>>> + if (convert_affine_scev (loop, type, &base, &step, NULL, true)) >>>> + return build_polynomial_chrec (loop->num, base, step); >>>> >>>> ^^^ not forget to set *fold_conversions to true? Or we need to use >>>> convert_affine_scev (..., false)? >>> >>> Nice catch. It's supposed to be called only if source scev has no >>> overflow behavior introduced by previous call to >>> chrec_convert_aggressive. In other words, it should be guarded by >>> "!*fold_conversions" like below: >>> >>> + >>> + if (!*fold_conversions && evolution_function_is_affine_p (chrec)) >>> + { >>> + tree base, step; >>> + struct loop *loop; >>> + >>> + loop = get_chrec_loop (chrec); >>> + base = CHREC_LEFT (chrec); >>> + step = CHREC_RIGHT (chrec); >>> + if (convert_affine_scev (loop, type, &base, &step, NULL, true)) >>> + return build_polynomial_chrec (loop->num, base, step); >>> + } >>> >>> The scenario is rare that didn't exposed in either bootstrap or reg-test. >>> >>> Here is the updated patch without any other difference. Bootstrap and >>> test on x86_64 & AArch64. >> >> Ok. >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>>> >>>> + } >>>> >>>> (bah, and the diff somehow messes up -p context :/ which is why I like >>>> context diffs more) >>>> >>>> Other from the above the patch looks good to me. >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> bin >>>>> >>>>> 2015-05-20 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> PR tree-optimization/62173 >>>>> * tree-ssa-loop-ivopts.c (struct iv): New field. Reorder fields. >>>>> (alloc_iv, set_iv): New parameter. >>>>> (determine_biv_step): Delete. >>>>> (find_bivs): Inline original determine_biv_step. Pass new >>>>> argument to set_iv. >>>>> (idx_find_step): Use no_overflow information for conversion. >>>>> * tree-scalar-evolution.c (analyze_scalar_evolution_in_loop): Let >>>>> resolve_mixers handle folded_casts. >>>>> (instantiate_scev_name): Change bool parameter to bool pointer. >>>>> (instantiate_scev_poly, instantiate_scev_binary): Ditto. >>>>> (instantiate_array_ref, instantiate_scev_not): Ditto. >>>>> (instantiate_scev_3, instantiate_scev_2): Ditto. >>>>> (instantiate_scev_1, instantiate_scev_r): Ditto. >>>>> (instantiate_scev_convert, ): Change parameter. Pass argument >>>>> to chrec_convert_aggressive. >>>>> (instantiate_scev): Change argument. >>>>> (resolve_mixers): New parameter and set it. >>>>> (scev_const_prop): New argument. >>>>> * tree-scalar-evolution.h (resolve_mixers): New parameter. >>>>> * tree-chrec.c (convert_affine_scev): Call chrec_convert instead >>>>> of chrec_conert_1. >>>>> (chrec_convert): New parameter. Move definition below. >>>>> (chrec_convert_aggressive): New parameter and set it. Call >>>>> convert_affine_scev. >>>>> * tree-chrec.h (chrec_convert): New parameter. >>>>> (chrec_convert_aggressive): Ditto. >>>>> * tree-ssa-loop-niter.c (loop_exits_before_overflow): New >>>>> function. >>>>> (scev_probably_wraps_p): Factor loop niter related code into >>>>> loop_exits_before_overflow. >>>>> >>>>> gcc/testsuite/ChangeLog >>>>> 2015-05-20 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> PR tree-optimization/62173 >>>>> * gcc.dg/tree-ssa/scev-3.c: Remove xfail. >>>>> * gcc.dg/tree-ssa/scev-4.c: Ditto. >>>>> * gcc.dg/tree-ssa/scev-8.c: New. > > Attached patch applied as revision 224009. I removed changes in > tree-ssa-loop-niter.c (also the new test) in this patch because it's > just code refactoring and will be covered by the second patch.
As noticed, gcc.target/i386/pr49781-1.c failed because of this patch. Seems IVO made sub-optimal choice with even more IVs recognized. Filed PR66388 for tracking and will investigate the failure. Thanks, bin