On Wed, May 27, 2015 at 11:44 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > Hi Bin, > > > On 08/05/15 11:47, Bin Cheng wrote: >> >> Hi, >> GCC's IVO currently handles every IV use independently, which is not right >> by learning from cases reported in PR65447. >> >> The rationale is: >> 1) Lots of address type IVs refer to the same memory object, share similar >> base and have same step. We should handle these IVs as a group in order >> to >> maximize CSE opportunities, prefer reg+offset addressing mode. >> 2) GCC's IVO algorithm is expensive and only is run when candidate set is >> small enough. By grouping same family uses, we can decrease the number of >> both uses and candidates. Before this patch, number of candidates for >> PR65447 is too big to run expensive IVO algorithm, resulting in bad >> assembly >> code on targets like AArch64 and Mips. >> 3) Even for cases the assembly code isn't improved, we can still get >> compilation time benefit with this patch. >> 4) This is a prerequisite for enabling auto-increment support in IVO on >> AArch64. >> >> For now, this is only done to address type IVs, in the future I may extend >> it to general IVs too. >> >> For AArch64: >> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by >> this patch. A couple of cases from spec2k/fp appear regressed. I looked >> into generated assembly code and can confirm the regression is false alarm >> except one case (189.lucas). For that case, I think it's another issue >> exposed by this patch (GCC failed to CSE candidate setup code, resulting >> in >> bloated loop header). Anyway, I also fined tuned the patch to minimize >> the >> impact. >> >> For AArch32, this patch seems to be able to improve spec2kfp too, but I >> didn't look deep into it. I guess the reason is it can make life for >> auto-increment support in IVO better. >> >> One of defects of this patch is computation of max offset in >> compute_max_addr_offset is basically borrowed from get_address_cost. The >> comment says we should find a better way to compute all information. >> People >> also complained we need to refactor that part of code. I don't have good >> solution to that yet, though I did try best to keep >> compute_max_addr_offset >> simple. >> >> I believe this is a generally wanted change, bootstrap and test on x86_64 >> and AArch64, so is it ok? >> >> >> 2015-05-08 Bin Cheng <bin.ch...@arm.com> >> >> PR tree-optimization/65447 >> * tree-ssa-loop-ivopts.c (struct iv_use): New fields. >> (dump_use, dump_uses): Support to dump sub use. >> (record_use): New parameters to support sub use. Remove call to >> dump_use. >> (record_sub_use, record_group_use): New functions. >> (compute_max_addr_offset, split_all_small_groups): New functions. >> (group_address_uses, rewrite_use_address): New functions. >> (strip_offset): New declaration. >> (find_interesting_uses_address): Call record_group_use. >> (add_candidate): New assertion. >> (infinite_cost_p): Move definition forward. >> (add_costs): Check INFTY cost and return immediately. >> (get_computation_cost_at): Clear setup cost and dependent bitmap >> for sub uses. >> (determine_use_iv_cost_address): Compute cost for sub uses. >> (rewrite_use_address_1): Rename from old rewrite_use_address. >> (free_loop_data): Free sub uses. >> (tree_ssa_iv_optimize_loop): Call group_address_uses. >> >> gcc/testsuite/ChangeLog >> 2015-05-08 Bin Cheng <bin.ch...@arm.com> >> >> PR tree-optimization/65447 >> * gcc.dg/tree-ssa/pr65447.c: New test. > > > I see this test failing on arm-none-eabi with a compiler at r223737. > My configure options are: --enable-checking=yes --with-newlib > --with-fpu=neon-fp-armv8 --with-arch=armv8-a --without-isl
Hi Kyrill, Thank you for reporting this. I will have a look. Thanks, bin > > Kyrill > >