On Fri, May 12, 2017 at 10:50 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, May 12, 2017 at 8:39 AM, Christophe Lyon > <christophe.l...@linaro.org> wrote: >> Hi Bin, >> >> >> On 4 May 2017 at 17:25, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>>> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>>> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>>> Hi, >>>>>>> This is the major part of this patch series. It rewrites cost >>>>>>> computation of ivopts using tree affine. >>>>>>> Apart from description given by cover message: >>>>>>> A) New computation cost model. Currently, there are big amount code >>>>>>> trying to understand >>>>>>> tree expression and estimate its computation cost. The model is >>>>>>> designed long ago >>>>>>> for generic tree expressions. In order to process generic >>>>>>> expression (even address >>>>>>> expression of array/memory references), it has code for too many >>>>>>> corner cases. The >>>>>>> problem is it's somehow impossible to handle all complicated >>>>>>> expressions, even with >>>>>>> complicated logic in functions like get_computation_cost_at, >>>>>>> difference_cost, >>>>>>> ptr_difference_cost, get_address_cost and so on... The second >>>>>>> problem is it's hard >>>>>>> to keep cost model consistent among special cases. As special >>>>>>> cases being added >>>>>>> from time to time, the model is no long unified any more. There >>>>>>> are cases that right >>>>>>> cost results in bad code, or vice versa, wrong cost results in >>>>>>> good code. Finally, >>>>>>> it's difficult to add code for new cases. >>>>>>> This patch introduces a new cost computation model by using tree >>>>>>> affine. Tree exprs >>>>>>> are lowered to aff_tree which is simple arithmetic operation >>>>>>> usually. Code handling >>>>>>> special cases is no longer necessary, which brings us quite >>>>>>> simplicity. It is also >>>>>>> easier to compute consistent costs among different expressions >>>>>>> using tree affine, >>>>>>> which gives us a unified cost model. >>>>>>> This patch also fixes issue that cost computation for address type >>>>>>> iv_use is inconsistent >>>>>>> with how it is re-rewritten in the end. It greatly simplified cost >>>>>>> computation. >>>>>>> >>>>>>> Is it OK? >>>>>> >>>>>> The patch is quite hard to follow (diff messes up here -- this is a >>>>>> case where a context >>>>> Hi Richard, >>>>> Thanks for reviewing, attachment is the updated context diff. It also >>>>> includes a minor fix handling pre-increment addressing mode, >>>>> specifically, it adds two lines of code: >>>>> >>>>> if (stmt_after_increment (data->current_loop, cand, use->stmt)) >>>>> ainc_offset += ainc_step; >>>>> >>>>> >>>>>> diff is easier to read). I trust you on the implementation details >>>>>> here, the overall structure >>>>>> looks ok to me. The only question I have is with regarding to >>>>>> get_loop_invariant_expr >>>>>> which seems to be much less sophisticated than before (basically it's >>>>>> now what was >>>>>> record_inv_expr?). I suppose the old functionality is superseeded by >>>>>> using affines >>>>>> everywhere else. >>>>> Yes, previous version tries a lot to cancel common sub expression when >>>>> representing use with cand. New version relies on tree affine which >>>>> is much better. One problem with invariant expression estimation is >>>>> we don't know in IVOPT if the inv_expr would be hoisted or not by >>>>> later passes. This problem exists all the time, we can only make >>>>> assumptions here, I think new version is bit more aggressive in >>>>> recording new inv_expr here. >>>> >>>> Thanks. >>>> >>>> LGTM. >>> Trivial update replacing calls to aff_combination_simple_p because of >>> change in previous patch. >>> >>> Thanks, >>> bin >> >> This commit (r247885) causes regressions on aarch64: >> - PASS now FAIL [PASS => FAIL]: >> >> Executed from: gcc.target/aarch64/aarch64.exp >> gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+. > Hi, > Thanks for reporting. The cause is a bit complicated for this > failure. First, IVOPTs chooses [base + 4], rather than [base] address > expression because it has no knowledge that ld1r doesn't support > pre-autoincrement addressing mode; also RTL passes also contributes to > the regression. I will open a PR for tracking. PR80724 filed.
Thanks, bin