On Fri, May 8, 2015 at 12:47 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > GCC's IVO currently handles every IV use independently, which is not right > by learning from cases reported in PR65447. > > The rationale is: > 1) Lots of address type IVs refer to the same memory object, share similar > base and have same step. We should handle these IVs as a group in order to > maximize CSE opportunities, prefer reg+offset addressing mode. > 2) GCC's IVO algorithm is expensive and only is run when candidate set is > small enough. By grouping same family uses, we can decrease the number of > both uses and candidates. Before this patch, number of candidates for > PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly > code on targets like AArch64 and Mips. > 3) Even for cases the assembly code isn't improved, we can still get > compilation time benefit with this patch. > 4) This is a prerequisite for enabling auto-increment support in IVO on > AArch64. > > For now, this is only done to address type IVs, in the future I may extend > it to general IVs too. > > For AArch64: > Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by > this patch. A couple of cases from spec2k/fp appear regressed. I looked > into generated assembly code and can confirm the regression is false alarm > except one case (189.lucas). For that case, I think it's another issue > exposed by this patch (GCC failed to CSE candidate setup code, resulting in > bloated loop header). Anyway, I also fined tuned the patch to minimize the > impact. > > For AArch32, this patch seems to be able to improve spec2kfp too, but I > didn't look deep into it. I guess the reason is it can make life for > auto-increment support in IVO better. > > One of defects of this patch is computation of max offset in > compute_max_addr_offset is basically borrowed from get_address_cost. The > comment says we should find a better way to compute all information. People > also complained we need to refactor that part of code. I don't have good > solution to that yet, though I did try best to keep compute_max_addr_offset > simple. > > I believe this is a generally wanted change, bootstrap and test on x86_64 > and AArch64, so is it ok?
I'm a little bit worried about the linked list of sub-uses and the "sorting" (that's quadratic). A little. I don't have any good idea but to use a tree... We don't seem to limit the number of sub-uses (if we'd do that it would become O(1)). Similar is searching in the list of uses for a group with same base/step (but ISTR IVOPTs has multiple similar loops?) Overall the patch looks like a good improvement to how we do IVO, so I think it is ok as-is. Thanks, Richard. > > 2015-05-08 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/65447 > * tree-ssa-loop-ivopts.c (struct iv_use): New fields. > (dump_use, dump_uses): Support to dump sub use. > (record_use): New parameters to support sub use. Remove call to > dump_use. > (record_sub_use, record_group_use): New functions. > (compute_max_addr_offset, split_all_small_groups): New functions. > (group_address_uses, rewrite_use_address): New functions. > (strip_offset): New declaration. > (find_interesting_uses_address): Call record_group_use. > (add_candidate): New assertion. > (infinite_cost_p): Move definition forward. > (add_costs): Check INFTY cost and return immediately. > (get_computation_cost_at): Clear setup cost and dependent bitmap > for sub uses. > (determine_use_iv_cost_address): Compute cost for sub uses. > (rewrite_use_address_1): Rename from old rewrite_use_address. > (free_loop_data): Free sub uses. > (tree_ssa_iv_optimize_loop): Call group_address_uses. > > gcc/testsuite/ChangeLog > 2015-05-08 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/65447 > * gcc.dg/tree-ssa/pr65447.c: New test.