On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
> prevented from representing address of one memory object using address of
> another memory object.  Also as I commented in that PR, we have two possible
> fixes for this:
> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
> using another one, while not breaking PR50955.
> 2) Add iv candidates with base object stripped.  In this way, we use the
> common base-stripped part to represent all address expressions, in the form
> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>
> In terms of code generation, method 2) is at least as good as 1), actually
> better in my opinion.  The problem of 2) is we need to tell when iv
> candidates should be added for the common part and when shouldn't.  This
> issue can be generalized and described as: We know IVO tries to add
> candidates by deriving from iv uses.  One disadvantage is that candidates
> are derived from iv use independently.  It doesn't take common sub
> expression among different iv uses into consideration.  As a result,
> candidate for common sub expression is not added, while many useless
> candidates are added.
>
> As a matter of fact, candidate derived from iv use is useful only if it's
> common enough and could be shared among different uses.  A candidate is most
> likely useless if it's derived from a single use and could not be shared by
> others.  This patch works in this way by firstly recording all kinds
> candidates derived from iv uses, then adding candidates for common ones.
>
> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
> they are false alarms caused by align issues.
> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
> another irrelevant issue caused by heuristic candidate selecting algorithm.
> Unfortunately, I don't have fix to it currently.
>
> This patch may add more candidates in some cases, but generally candidates
> number is smaller because we don't need to add useless candidates now.
> Statistic data shows there are quite fewer loops with more than 30
> candidates when building spec2k6 on x86_64 using this patch.
>
> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
> AArch64.  Is it OK?

+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+                          const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+        && operand_equal_p (ccand1->base, ccand2->base, 0)
+        && operand_equal_p (ccand1->step, ccand2->step, 0)
+        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+             == TYPE_PRECISION (TREE_TYPE (ccand2->base));

I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
unconditionally?

+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);

allocate from the IV obstack instead?  I see we do a lot of heap allocations
in IVOPTs, so we can improve that as followup as well.

We probably should empty the obstack after each processed loop.

Thanks,
Richard.


> Thanks,
> bin
>
> 2015-11-03  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/52272
>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
>         (struct iv_common_cand_hasher): New struct.
>         (iv_common_cand_hasher::hash): New function.
>         (iv_common_cand_hasher::equal): New function.
>         (struct ivopts_data): New fields, iv_common_cand_tab and
>         iv_common_cands.
>         (tree_ssa_iv_optimize_init): Initialize above fields.
>         (record_common_cand, common_cand_cmp): New functions.
>         (add_iv_candidate_derived_from_uses): New function.
>         (add_iv_candidate_for_use): Record iv_common_cands derived from
>         iv use in hash table, instead of adding candidates directly.
>         (add_iv_candidate_for_uses): Call
> add_iv_candidate_derived_from_uses.
>         (record_important_candidates): Add important candidates to iv uses'
>         related_cands.  Always keep related_cands for future use.
>         (try_add_cand_for): Use iv uses' related_cands.
>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
>         in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

Reply via email to