On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > This patch is a new approach to fix PR66388. IVO today computes iv_use with > iv_cand which has at least same type precision as the use. On 64bit > platforms like AArch64, this results in different iv_cand created for each > address type iv_use, and register pressure increased. As a matter of fact, > the BIV should be used for all iv_uses in some of these cases. It is a > latent bug but recently getting worse because of overflow changes. > > The original approach at > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue > except it conflict with IV elimination. Seems to me it is impossible to > mitigate the contradiction. > > This new approach fixes the issue by adding sizetype iv_cand for BIVs > directly. In cases if the original BIV is preferred, the sizetype iv_cand > will be chosen. As for code generation, the sizetype iv_cand has the same > effect as the original BIV. Actually, it's better because BIV needs to be > explicitly extended to sizetype to be used in address expression on most > targets. > > One shortage of this approach is it may introduce more iv candidates. To > minimize the impact, this patch does sophisticated code analysis and adds > sizetype candidate for BIV only if it is used as index. Moreover, it avoids > to add candidate of the original type if the BIV is only used as index. > Statistics for compiling spec2k6 shows increase of candidate number is > modest and can be ignored. > > There are two more patches following to fix corner cases revealed by this > one. In together they bring obvious perf improvement for spec26k/int on > aarch64. > Spec2k6/int > 400.perlbench 3.44% > 445.gobmk -0.86% > 456.hmmer 14.83% > 458.sjeng 2.49% > 462.libquantum -0.79% > GEOMEAN 1.68% > > There is also about 0.36% improvement for spec2k6/fp, mostly because of case > 436.cactusADM. I believe it can be further improved, but that should be > another patch. > > I also collected benchmark data for x86_64. Spec2k6/fp is not affected. As > for spec2k6/int, though the geomean is improved slightly, 400.perlbench is > regressed by ~3%. I can see BIVs are chosen for some loops instead of > address candidates. Generally, the loop header will be simplified because > iv elimination with BIV is simpler; the number of instructions in loop body > isn't changed. I suspect the regression comes from different addressing > modes. With BIV, complex addressing mode like [base + index << scale + > disp] is used, rather than [base + disp]. I guess the former has more > micro-ops, thus more expensive. This guess can be confirmed by manually > suppressing the complex addressing mode with higher address cost. > Now the problem becomes why overall cost of BIV is computed lower while the > actual cost is higher. I noticed for most affected loops, loop header is > bloated because of iv elimination using the old address candidate. The > bloated loop header results in much higher cost than BIV. As a result, BIV > is preferred. I also noticed the bloated loop header generally can be > simplified (I have a following patch for this). After applying the local > patch, the old address candidate is chosen, and most of regression is > recovered. > Conclusion is I think loop header bloated issue should be blamed for the > regression, and it can be resolved. > > Bootstrap and test on x64_64 and aarch64. It fixes failure of > gcc.target/i386/pr49781-1.c, without new breakage. > > So what do you think?
The data above looks ok to me. +static struct iv * +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv) +{ + aff_tree aff; + struct expand_data exp_data; + + if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME) + return iv; + + /* Expand IV's ssa_name till the deriving biv is found. */ + exp_data.data = data; + exp_data.biv = NULL; + tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name), + &aff, &data->name_expansion_cache, + stop_expand, &exp_data); + return exp_data.biv; that's actually "abusing" tree_to_aff_combination_expand for simply walking SSA uses and their defs uses recursively until you hit "stop". ISTR past discussion to add a generic walk_ssa_use interface for that. Not sure if it materialized with a name I can't remember or whether it didn't. - add_candidate (data, iv->base, iv->step, true, NULL); + /* Check if this biv is used in address type use. */ + if (iv->no_overflow && iv->have_address_use + && INTEGRAL_TYPE_P (TREE_TYPE (iv->base)) + && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype)) + { + tree type = unsigned_type_for (sizetype); sizetype is unsigned. the rest looks ok to me but I really don't like the abuse of tree_to_aff_combination_expand... Thanks, Richard. > Thanks, > bin > > 2015-08-31 Bin Cheng <bin.ch...@arm.com> > > * tree-affine.c (aff_combination_expand): New parameters. > (tree_to_aff_combination_expand): Ditto. > * tree-affine.h (aff_combination_expand): New declaration. > (tree_to_aff_combination_expand): Ditto. > * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields. > (dump_iv): Dump no_overflow information. > (alloc_iv): Initialize new field for struct iv. > (struct expand_data): New struct for affine combination expanding. > (stop_expand): New callback func for affine combination expanding. > (find_deriving_biv_for_iv, record_biv_for_address_use): New > functions. > (idx_find_step): Call new functions above. > (find_depends, add_candidate): New paramter. > (add_iv_candidate_for_biv): Add sizetype cand for BIV. > (get_computation_aff): Simplify convertion of cand for BIV. > (get_computation_cost_at): Step cand's base if necessary. >