On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> This patch is a new approach to fix PR66388.  IVO today computes iv_use with
> iv_cand which has at least same type precision as the use.  On 64bit
> platforms like AArch64, this results in different iv_cand created for each
> address type iv_use, and register pressure increased.  As a matter of fact,
> the BIV should be used for all iv_uses in some of these cases.  It is a
> latent bug but recently getting worse because of overflow changes.
>
> The original approach at
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue
> except it conflict with IV elimination.  Seems to me it is impossible to
> mitigate the contradiction.
>
> This new approach fixes the issue by adding sizetype iv_cand for BIVs
> directly.  In cases if the original BIV is preferred, the sizetype iv_cand
> will be chosen.  As for code generation, the sizetype iv_cand has the same
> effect as the original BIV.  Actually, it's better because BIV needs to be
> explicitly extended to sizetype to be used in address expression on most
> targets.
>
> One shortage of this approach is it may introduce more iv candidates.  To
> minimize the impact, this patch does sophisticated code analysis and adds
> sizetype candidate for BIV only if it is used as index.  Moreover, it avoids
> to add candidate of the original type if the BIV is only used as index.
> Statistics for compiling spec2k6 shows increase of candidate number is
> modest and can be ignored.
>
> There are two more patches following to fix corner cases revealed by this
> one.  In together they bring obvious perf improvement for spec26k/int on
> aarch64.
> Spec2k6/int
> 400.perlbench   3.44%
> 445.gobmk       -0.86%
> 456.hmmer       14.83%
> 458.sjeng       2.49%
> 462.libquantum  -0.79%
> GEOMEAN         1.68%
>
> There is also about 0.36% improvement for spec2k6/fp, mostly because of case
> 436.cactusADM.  I believe it can be further improved, but that should be
> another patch.
>
> I also collected benchmark data for x86_64.  Spec2k6/fp is not affected.  As
> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is
> regressed by ~3%.  I can see BIVs are chosen for some loops instead of
> address candidates.  Generally, the loop header will be simplified because
> iv elimination with BIV is simpler; the number of instructions in loop body
> isn't changed.  I suspect the regression comes from different addressing
> modes.  With BIV, complex addressing mode like [base + index << scale +
> disp] is used, rather than [base + disp].  I guess the former has more
> micro-ops, thus more expensive.  This guess can be confirmed by manually
> suppressing the complex addressing mode with higher address cost.
> Now the problem becomes why overall cost of BIV is computed lower while the
> actual cost is higher.  I noticed for most affected loops, loop header is
> bloated because of iv elimination using the old address candidate.  The
> bloated loop header results in much higher cost than BIV.  As a result, BIV
> is preferred.  I also noticed the bloated loop header generally can be
> simplified (I have a following patch for this).  After applying the local
> patch, the old address candidate is chosen, and most of regression is
> recovered.
> Conclusion is I think loop header bloated issue should be blamed for the
> regression, and it can be resolved.
>
> Bootstrap and test on x64_64 and aarch64.  It fixes failure of
> gcc.target/i386/pr49781-1.c, without new breakage.
>
> So what do you think?

The data above looks ok to me.

+static struct iv *
+find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv)
+{
+  aff_tree aff;
+  struct expand_data exp_data;
+
+  if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME)
+    return iv;
+
+  /* Expand IV's ssa_name till the deriving biv is found.  */
+  exp_data.data = data;
+  exp_data.biv = NULL;
+  tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name),
+                                 &aff, &data->name_expansion_cache,
+                                 stop_expand, &exp_data);
+  return exp_data.biv;

that's actually "abusing" tree_to_aff_combination_expand for simply walking
SSA uses and their defs uses recursively until you hit "stop".  ISTR past
discussion to add a generic walk_ssa_use interface for that.  Not sure if it
materialized with a name I can't remember or whether it didn't.

-  add_candidate (data, iv->base, iv->step, true, NULL);
+  /* Check if this biv is used in address type use.  */
+  if (iv->no_overflow  && iv->have_address_use
+      && INTEGRAL_TYPE_P (TREE_TYPE (iv->base))
+      && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype))
+    {
+      tree type = unsigned_type_for (sizetype);

sizetype is unsigned.

the rest looks ok to me but I really don't like the abuse of
tree_to_aff_combination_expand...

Thanks,
Richard.

> Thanks,
> bin
>
> 2015-08-31  Bin Cheng  <bin.ch...@arm.com>
>
>         * tree-affine.c (aff_combination_expand): New parameters.
>         (tree_to_aff_combination_expand): Ditto.
>         * tree-affine.h (aff_combination_expand): New declaration.
>         (tree_to_aff_combination_expand): Ditto.
>         * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields.
>         (dump_iv): Dump no_overflow information.
>         (alloc_iv): Initialize new field for struct iv.
>         (struct expand_data): New struct for affine combination expanding.
>         (stop_expand): New callback func for affine combination expanding.
>         (find_deriving_biv_for_iv, record_biv_for_address_use): New
> functions.
>         (idx_find_step): Call new functions above.
>         (find_depends, add_candidate): New paramter.
>         (add_iv_candidate_for_biv): Add sizetype cand for BIV.
>         (get_computation_aff): Simplify convertion of cand for BIV.
>         (get_computation_cost_at): Step cand's base if necessary.
>

Reply via email to