On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > PR52272 reported a performance regression in spec2006/410.bwaves once GCC is > prevented from representing address of one memory object using address of > another memory object. Also as I commented in that PR, we have two possible > fixes for this: > 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj > using another one, while not breaking PR50955. > 2) Add iv candidates with base object stripped. In this way, we use the > common base-stripped part to represent all address expressions, in the form > of [base_1 + common], [base_2 + common], ..., [base_n + common]. > > In terms of code generation, method 2) is at least as good as 1), actually > better in my opinion. The problem of 2) is we need to tell when iv > candidates should be added for the common part and when shouldn't. This > issue can be generalized and described as: We know IVO tries to add > candidates by deriving from iv uses. One disadvantage is that candidates > are derived from iv use independently. It doesn't take common sub > expression among different iv uses into consideration. As a result, > candidate for common sub expression is not added, while many useless > candidates are added. > > As a matter of fact, candidate derived from iv use is useful only if it's > common enough and could be shared among different uses. A candidate is most > likely useless if it's derived from a single use and could not be shared by > others. This patch works in this way by firstly recording all kinds > candidates derived from iv uses, then adding candidates for common ones. > > The patch improves 410.bwaves by 3-4% on x86_64. I also saw regression for > 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm > they are false alarms caused by align issues. > For aarch64, fp cases are obviously improved for both spec2000 and spec2006. > Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is > another irrelevant issue caused by heuristic candidate selecting algorithm. > Unfortunately, I don't have fix to it currently. > > This patch may add more candidates in some cases, but generally candidates > number is smaller because we don't need to add useless candidates now. > Statistic data shows there are quite fewer loops with more than 30 > candidates when building spec2k6 on x86_64 using this patch. > > Bootstrap and test on x86_64. I will re-test it against latest trunk on > AArch64. Is it OK?
+inline bool +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, + const iv_common_cand *ccand2) +{ + return ccand1->hash == ccand2->hash + && operand_equal_p (ccand1->base, ccand2->base, 0) + && operand_equal_p (ccand1->step, ccand2->step, 0) + && TYPE_PRECISION (TREE_TYPE (ccand1->base)) + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); I'm wondering on the TYPE_PRECISION check. a) why is that needed? and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION unconditionally? + slot = data->iv_common_cand_tab->find_slot (&ent, INSERT); + if (*slot == NULL) + { + *slot = XNEW (struct iv_common_cand); allocate from the IV obstack instead? I see we do a lot of heap allocations in IVOPTs, so we can improve that as followup as well. We probably should empty the obstack after each processed loop. Thanks, Richard. > Thanks, > bin > > 2015-11-03 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/52272 > * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct. > (struct iv_common_cand_hasher): New struct. > (iv_common_cand_hasher::hash): New function. > (iv_common_cand_hasher::equal): New function. > (struct ivopts_data): New fields, iv_common_cand_tab and > iv_common_cands. > (tree_ssa_iv_optimize_init): Initialize above fields. > (record_common_cand, common_cand_cmp): New functions. > (add_iv_candidate_derived_from_uses): New function. > (add_iv_candidate_for_use): Record iv_common_cands derived from > iv use in hash table, instead of adding candidates directly. > (add_iv_candidate_for_uses): Call > add_iv_candidate_derived_from_uses. > (record_important_candidates): Add important candidates to iv uses' > related_cands. Always keep related_cands for future use. > (try_add_cand_for): Use iv uses' related_cands. > (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields > in struct ivopts_data, iv_common_cand_tab and iv_common_cands.