On Fri, May 12, 2017 at 8:39 AM, Christophe Lyon <christophe.l...@linaro.org> wrote: > Hi Bin, > > > On 4 May 2017 at 17:25, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> Hi, >>>>>> This is the major part of this patch series. It rewrites cost >>>>>> computation of ivopts using tree affine. >>>>>> Apart from description given by cover message: >>>>>> A) New computation cost model. Currently, there are big amount code >>>>>> trying to understand >>>>>> tree expression and estimate its computation cost. The model is >>>>>> designed long ago >>>>>> for generic tree expressions. In order to process generic >>>>>> expression (even address >>>>>> expression of array/memory references), it has code for too many >>>>>> corner cases. The >>>>>> problem is it's somehow impossible to handle all complicated >>>>>> expressions, even with >>>>>> complicated logic in functions like get_computation_cost_at, >>>>>> difference_cost, >>>>>> ptr_difference_cost, get_address_cost and so on... The second >>>>>> problem is it's hard >>>>>> to keep cost model consistent among special cases. As special >>>>>> cases being added >>>>>> from time to time, the model is no long unified any more. There >>>>>> are cases that right >>>>>> cost results in bad code, or vice versa, wrong cost results in good >>>>>> code. Finally, >>>>>> it's difficult to add code for new cases. >>>>>> This patch introduces a new cost computation model by using tree >>>>>> affine. Tree exprs >>>>>> are lowered to aff_tree which is simple arithmetic operation >>>>>> usually. Code handling >>>>>> special cases is no longer necessary, which brings us quite >>>>>> simplicity. It is also >>>>>> easier to compute consistent costs among different expressions >>>>>> using tree affine, >>>>>> which gives us a unified cost model. >>>>>> This patch also fixes issue that cost computation for address type >>>>>> iv_use is inconsistent >>>>>> with how it is re-rewritten in the end. It greatly simplified cost >>>>>> computation. >>>>>> >>>>>> Is it OK? >>>>> >>>>> The patch is quite hard to follow (diff messes up here -- this is a >>>>> case where a context >>>> Hi Richard, >>>> Thanks for reviewing, attachment is the updated context diff. It also >>>> includes a minor fix handling pre-increment addressing mode, >>>> specifically, it adds two lines of code: >>>> >>>> if (stmt_after_increment (data->current_loop, cand, use->stmt)) >>>> ainc_offset += ainc_step; >>>> >>>> >>>>> diff is easier to read). I trust you on the implementation details >>>>> here, the overall structure >>>>> looks ok to me. The only question I have is with regarding to >>>>> get_loop_invariant_expr >>>>> which seems to be much less sophisticated than before (basically it's >>>>> now what was >>>>> record_inv_expr?). I suppose the old functionality is superseeded by >>>>> using affines >>>>> everywhere else. >>>> Yes, previous version tries a lot to cancel common sub expression when >>>> representing use with cand. New version relies on tree affine which >>>> is much better. One problem with invariant expression estimation is >>>> we don't know in IVOPT if the inv_expr would be hoisted or not by >>>> later passes. This problem exists all the time, we can only make >>>> assumptions here, I think new version is bit more aggressive in >>>> recording new inv_expr here. >>> >>> Thanks. >>> >>> LGTM. >> Trivial update replacing calls to aff_combination_simple_p because of >> change in previous patch. >> >> Thanks, >> bin > > This commit (r247885) causes regressions on aarch64: > - PASS now FAIL [PASS => FAIL]: > > Executed from: gcc.target/aarch64/aarch64.exp > gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+. Hi, Thanks for reporting. The cause is a bit complicated for this failure. First, IVOPTs chooses [base + 4], rather than [base] address expression because it has no knowledge that ld1r doesn't support pre-autoincrement addressing mode; also RTL passes also contributes to the regression. I will open a PR for tracking.
Thanks, bin > > Christophe > >>> >>> Richard. >>> >>>> Thanks, >>>> bin >>>>> >>>>> Otherwise ok. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> >>>>>> Thanks, >>>>>> bin >>>>>> 2017-04-11 Bin Cheng <bin.ch...@arm.com> >>>>>> >>>>>> * tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify. >>>>>> (adjust_setup_cost): New parameter supporting round up >>>>>> adjustment. >>>>>> (struct address_cost_data): Delete. >>>>>> (force_expr_to_var_cost): Don't bound cost with spill_cost. >>>>>> (split_address_cost, ptr_difference_cost): Delete. >>>>>> (difference_cost, compare_aff_trees, record_inv_expr): Delete. >>>>>> (struct ainc_cost_data): New struct. >>>>>> (get_address_cost_ainc): New function. >>>>>> (get_address_cost, get_computation_cost): Reimplement. >>>>>> (determine_group_iv_cost_address): Record inv_expr for all uses >>>>>> of >>>>>> a group. >>>>>> (determine_group_iv_cost_cond): Call get_loop_invariant_expr. >>>>>> (iv_ca_has_deps): Reimplemented to ... >>>>>> (iv_ca_more_deps): ... this. Check if NEW_CP introduces more >>>>>> deps >>>>>> than OLD_CP. >>>>>> (iv_ca_extend): Call iv_ca_more_deps.