On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> This patch is a new approach to fix PR66388. IVO today computes iv_use with >> iv_cand which has at least same type precision as the use. On 64bit >> platforms like AArch64, this results in different iv_cand created for each >> address type iv_use, and register pressure increased. As a matter of fact, >> the BIV should be used for all iv_uses in some of these cases. It is a >> latent bug but recently getting worse because of overflow changes. >> >> The original approach at >> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue >> except it conflict with IV elimination. Seems to me it is impossible to >> mitigate the contradiction. >> >> This new approach fixes the issue by adding sizetype iv_cand for BIVs >> directly. In cases if the original BIV is preferred, the sizetype iv_cand >> will be chosen. As for code generation, the sizetype iv_cand has the same >> effect as the original BIV. Actually, it's better because BIV needs to be >> explicitly extended to sizetype to be used in address expression on most >> targets. >> >> One shortage of this approach is it may introduce more iv candidates. To >> minimize the impact, this patch does sophisticated code analysis and adds >> sizetype candidate for BIV only if it is used as index. Moreover, it avoids >> to add candidate of the original type if the BIV is only used as index. >> Statistics for compiling spec2k6 shows increase of candidate number is >> modest and can be ignored. >> >> There are two more patches following to fix corner cases revealed by this >> one. In together they bring obvious perf improvement for spec26k/int on >> aarch64. >> Spec2k6/int >> 400.perlbench 3.44% >> 445.gobmk -0.86% >> 456.hmmer 14.83% >> 458.sjeng 2.49% >> 462.libquantum -0.79% >> GEOMEAN 1.68% >> >> There is also about 0.36% improvement for spec2k6/fp, mostly because of case >> 436.cactusADM. I believe it can be further improved, but that should be >> another patch. >> >> I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As >> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is >> regressed by ~3%. I can see BIVs are chosen for some loops instead of >> address candidates. Generally, the loop header will be simplified because >> iv elimination with BIV is simpler; the number of instructions in loop body >> isn't changed. I suspect the regression comes from different addressing >> modes. With BIV, complex addressing mode like [base + index << scale + >> disp] is used, rather than [base + disp]. I guess the former has more >> micro-ops, thus more expensive. This guess can be confirmed by manually >> suppressing the complex addressing mode with higher address cost. >> Now the problem becomes why overall cost of BIV is computed lower while the >> actual cost is higher. I noticed for most affected loops, loop header is >> bloated because of iv elimination using the old address candidate. The >> bloated loop header results in much higher cost than BIV. As a result, BIV >> is preferred. I also noticed the bloated loop header generally can be >> simplified (I have a following patch for this). After applying the local >> patch, the old address candidate is chosen, and most of regression is >> recovered. >> Conclusion is I think loop header bloated issue should be blamed for the >> regression, and it can be resolved. >> >> Bootstrap and test on x64_64 and aarch64. It fixes failure of >> gcc.target/i386/pr49781-1.c, without new breakage. >> >> So what do you think? > > The data above looks ok to me. > > +static struct iv * > +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) > +{ > + aff_tree aff; > + struct expand_data exp_data; > + > + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) > + return iv; > + > + /* Expand IV's ssa_name till the deriving biv is found. */ > + exp_data.data = data; > + exp_data.biv = NULL; > + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), > + &aff, &data->name_expansion_cache, > + stop_expand, &exp_data); > + return exp_data.biv; > > that's actually "abusing" tree_to_aff_combination_expand for simply walking > SSA uses and their defs uses recursively until you hit "stop". ISTR past > discussion to add a generic walk_ssa_use interface for that. Not sure if it > materialized with a name I can't remember or whether it didn't. Thanks for reviewing. I didn't found existing interface to walk up definition chains of ssa vars. In this updated patch, I implemented a simple function which meets the minimal requirement of walking up definition chains of BIV variables. I also counted number of no_overflow BIVs that are not used in address type use. Since generally there are only two BIVs in a loop, this can prevent us from visiting definition chains most of time. Statistics shows that number of call to find_deriving_biv_for_expr plummet.
> > - add_candidate (data, iv->base, iv->step, true, NULL); > + /* Check if this biv is used in address type use. */ > + if (iv->no_overflow && iv->have_address_use > + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) > + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) > + { > + tree type = unsigned_type_for (sizetype); > > sizetype is unsigned. Fixed. Bootstrap and test on x86_64, is this OK? Thanks, bin > > the rest looks ok to me but I really don't like the abuse of > tree_to_aff_combination_expand... > > Thanks, > Richard. > >> Thanks, >> bin >> >> 2015-08-31 Bin Cheng <bin.ch...@arm.com> >> >> * tree-affine.c (aff_combination_expand): New parameters. >> (tree_to_aff_combination_expand): Ditto. >> * tree-affine.h (aff_combination_expand): New declaration. >> (tree_to_aff_combination_expand): Ditto. >> * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. >> (dump_iv): Dump no_overflow information. >> (alloc_iv): Initialize new field for struct iv. >> (struct expand_data): New struct for affine combination expanding. >> (stop_expand): New callback func for affine combination expanding. >> (find_deriving_biv_for_iv, record_biv_for_address_use): New >> functions. >> (idx_find_step): Call new functions above. >> (find_depends, add_candidate): New paramter. >> (add_iv_candidate_for_biv): Add sizetype cand for BIV. >> (get_computation_aff): Simplify convertion of cand for BIV. >> (get_computation_cost_at): Step cand's base if necessary. >>