On Tue, Sep 8, 2015 at 6:06 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> This patch is a new approach to fix PR66388. IVO today computes iv_use with >>> iv_cand which has at least same type precision as the use. On 64bit >>> platforms like AArch64, this results in different iv_cand created for each >>> address type iv_use, and register pressure increased. As a matter of fact, >>> the BIV should be used for all iv_uses in some of these cases. It is a >>> latent bug but recently getting worse because of overflow changes. >>> >>> The original approach at >>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue >>> except it conflict with IV elimination. Seems to me it is impossible to >>> mitigate the contradiction. >>> >>> This new approach fixes the issue by adding sizetype iv_cand for BIVs >>> directly. In cases if the original BIV is preferred, the sizetype iv_cand >>> will be chosen. As for code generation, the sizetype iv_cand has the same >>> effect as the original BIV. Actually, it's better because BIV needs to be >>> explicitly extended to sizetype to be used in address expression on most >>> targets. >>> >>> One shortage of this approach is it may introduce more iv candidates. To >>> minimize the impact, this patch does sophisticated code analysis and adds >>> sizetype candidate for BIV only if it is used as index. Moreover, it avoids >>> to add candidate of the original type if the BIV is only used as index. >>> Statistics for compiling spec2k6 shows increase of candidate number is >>> modest and can be ignored. >>> >>> There are two more patches following to fix corner cases revealed by this >>> one. In together they bring obvious perf improvement for spec26k/int on >>> aarch64. >>> Spec2k6/int >>> 400.perlbench 3.44% >>> 445.gobmk -0.86% >>> 456.hmmer 14.83% >>> 458.sjeng 2.49% >>> 462.libquantum -0.79% >>> GEOMEAN 1.68% >>> >>> There is also about 0.36% improvement for spec2k6/fp, mostly because of case >>> 436.cactusADM. I believe it can be further improved, but that should be >>> another patch. >>> >>> I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As >>> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is >>> regressed by ~3%. I can see BIVs are chosen for some loops instead of >>> address candidates. Generally, the loop header will be simplified because >>> iv elimination with BIV is simpler; the number of instructions in loop body >>> isn't changed. I suspect the regression comes from different addressing >>> modes. With BIV, complex addressing mode like [base + index << scale + >>> disp] is used, rather than [base + disp]. I guess the former has more >>> micro-ops, thus more expensive. This guess can be confirmed by manually >>> suppressing the complex addressing mode with higher address cost. >>> Now the problem becomes why overall cost of BIV is computed lower while the >>> actual cost is higher. I noticed for most affected loops, loop header is >>> bloated because of iv elimination using the old address candidate. The >>> bloated loop header results in much higher cost than BIV. As a result, BIV >>> is preferred. I also noticed the bloated loop header generally can be >>> simplified (I have a following patch for this). After applying the local >>> patch, the old address candidate is chosen, and most of regression is >>> recovered. >>> Conclusion is I think loop header bloated issue should be blamed for the >>> regression, and it can be resolved. >>> >>> Bootstrap and test on x64_64 and aarch64. It fixes failure of >>> gcc.target/i386/pr49781-1.c, without new breakage. >>> >>> So what do you think? >> >> The data above looks ok to me. >> >> +static struct iv * >> +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) >> +{ >> + aff_tree aff; >> + struct expand_data exp_data; >> + >> + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) >> + return iv; >> + >> + /* Expand IV's ssa_name till the deriving biv is found. */ >> + exp_data.data = data; >> + exp_data.biv = NULL; >> + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), >> + &aff, &data->name_expansion_cache, >> + stop_expand, &exp_data); >> + return exp_data.biv; >> >> that's actually "abusing" tree_to_aff_combination_expand for simply walking >> SSA uses and their defs uses recursively until you hit "stop". ISTR past >> discussion to add a generic walk_ssa_use interface for that. Not sure if it >> materialized with a name I can't remember or whether it didn't. > Thanks for reviewing. I didn't found existing interface to walk up > definition chains of ssa vars. In this updated patch, I implemented a > simple function which meets the minimal requirement of walking up > definition chains of BIV variables. I also counted number of > no_overflow BIVs that are not used in address type use. Since > generally there are only two BIVs in a loop, this can prevent us from > visiting definition chains most of time. Statistics shows that number > of call to find_deriving_biv_for_expr plummet. > >> >> - add_candidate (data, iv->base, iv->step, true, NULL); >> + /* Check if this biv is used in address type use. */ >> + if (iv->no_overflow && iv->have_address_use >> + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) >> + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) >> + { >> + tree type = unsigned_type_for (sizetype); >> >> sizetype is unsigned. > Fixed. > > Bootstrap and test on x86_64, is this OK? > > Thanks, > bin
And here is the updated Changelog entry. 2015-09-08 Bin Cheng <bin.ch...@arm.com> PR tree-optimization/66388 * tree-ssa-loop-ivopts.c (struct iv, iv_cand, ivopts_data): New fields. (dump_iv): Dump no_overflow information. (alloc_iv): Initialize new field for struct iv. (mark_bivs): Count number of no_overflow bivs. (find_deriving_biv_for_expr, record_biv_for_address_use): New functions. (idx_find_step): Call new functions above. (add_candidate_1, add_candidate): New paramter. (add_iv_candidate_for_biv): Add sizetype cand for BIV. (get_computation_aff): Simplify convertion of cand for BIV. (get_computation_cost_at): Step cand's base if necessary.