Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step

Richard Biener Wed, 13 May 2015 04:39:03 -0700

On Fri, May 8, 2015 at 12:47 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> GCC's IVO currently handles every IV use independently, which is not right
> by learning from cases reported in PR65447.
>
> The rationale is:
> 1) Lots of address type IVs refer to the same memory object, share similar
> base and have same step.  We should handle these IVs as a group in order to
> maximize CSE opportunities, prefer reg+offset addressing mode.
> 2) GCC's IVO algorithm is expensive and only is run when candidate set is
> small enough.  By grouping same family uses, we can decrease the number of
> both uses and candidates.  Before this patch, number of candidates for
> PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly
> code on targets like AArch64 and Mips.
> 3) Even for cases the assembly code isn't improved, we can still get
> compilation time benefit with this patch.
> 4) This is a prerequisite for enabling auto-increment support in IVO on
> AArch64.
>
> For now, this is only done to address type IVs, in the future I may extend
> it to general IVs too.
>
> For AArch64:
> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
> this patch.  A couple of cases from spec2k/fp appear regressed.  I looked
> into generated assembly code and can confirm the regression is false alarm
> except one case (189.lucas).  For that case, I think it's another issue
> exposed by this patch (GCC failed to CSE candidate setup code, resulting in
> bloated loop header).  Anyway, I also fined tuned the patch to minimize the
> impact.
>
> For AArch32, this patch seems to be able to improve spec2kfp too, but I
> didn't look deep into it.  I guess the reason is it can make life for
> auto-increment support in IVO better.
>
> One of defects of this patch is computation of max offset in
> compute_max_addr_offset is basically borrowed from get_address_cost.  The
> comment says we should find a better way to compute all information.  People
> also complained we need to refactor that part of code.  I don't have good
> solution to that yet, though I did try best to keep compute_max_addr_offset
> simple.
>
> I believe this is a generally wanted change, bootstrap and test on x86_64
> and AArch64, so is it ok?


I'm a little bit worried about the linked list of sub-uses and the "sorting"
(that's quadratic).  A little.  I don't have any good idea but to use a tree...
We don't seem to limit the number of sub-uses (if we'd do that it would
become O(1)).

Similar is searching in the list of uses for a group with same base/step
(but ISTR IVOPTs has multiple similar loops?)

Overall the patch looks like a good improvement to how we do IVO, so I think
it is ok as-is.

Thanks,
Richard.


>
> 2015-05-08  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/65447
>         * tree-ssa-loop-ivopts.c (struct iv_use): New fields.
>         (dump_use, dump_uses): Support to dump sub use.
>         (record_use): New parameters to support sub use.  Remove call to
>         dump_use.
>         (record_sub_use, record_group_use): New functions.
>         (compute_max_addr_offset, split_all_small_groups): New functions.
>         (group_address_uses, rewrite_use_address): New functions.
>         (strip_offset): New declaration.
>         (find_interesting_uses_address): Call record_group_use.
>         (add_candidate): New assertion.
>         (infinite_cost_p): Move definition forward.
>         (add_costs): Check INFTY cost and return immediately.
>         (get_computation_cost_at): Clear setup cost and dependent bitmap
>         for sub uses.
>         (determine_use_iv_cost_address): Compute cost for sub uses.
>         (rewrite_use_address_1): Rename from old rewrite_use_address.
>         (free_loop_data): Free sub uses.
>         (tree_ssa_iv_optimize_loop): Call group_address_uses.
>
> gcc/testsuite/ChangeLog
> 2015-05-08  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/65447
>         * gcc.dg/tree-ssa/pr65447.c: New test.

Re: [PATCH PR65447]Improve IV handling by grouping address type uses with same base and step

Reply via email to