On Fri, May 12, 2017 at 8:39 AM, Christophe Lyon
<christophe.l...@linaro.org> wrote:
> Hi Bin,
>
>
> On 4 May 2017 at 17:25, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Wed, Apr 26, 2017 at 11:18 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Wed, Apr 26, 2017 at 12:12 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>> On Wed, Apr 26, 2017 at 10:50 AM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Tue, Apr 18, 2017 at 12:43 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>> Hi,
>>>>>> This is the major part of this patch series.  It rewrites cost 
>>>>>> computation of ivopts using tree affine.
>>>>>> Apart from description given by cover message:
>>>>>>   A) New computation cost model.  Currently, there are big amount code 
>>>>>> trying to understand
>>>>>>      tree expression and estimate its computation cost.  The model is 
>>>>>> designed long ago
>>>>>>      for generic tree expressions.  In order to process generic 
>>>>>> expression (even address
>>>>>>      expression of array/memory references), it has code for too many 
>>>>>> corner cases.  The
>>>>>>      problem is it's somehow impossible to handle all complicated 
>>>>>> expressions, even with
>>>>>>      complicated logic in functions like get_computation_cost_at, 
>>>>>> difference_cost,
>>>>>>      ptr_difference_cost, get_address_cost and so on...  The second 
>>>>>> problem is it's hard
>>>>>>      to keep cost model consistent among special cases.  As special 
>>>>>> cases being added
>>>>>>      from time to time, the model is no long unified any more.  There 
>>>>>> are cases that right
>>>>>>      cost results in bad code, or vice versa, wrong cost results in good 
>>>>>> code.  Finally,
>>>>>>      it's difficult to add code for new cases.
>>>>>>      This patch introduces a new cost computation model by using tree 
>>>>>> affine.  Tree exprs
>>>>>>      are lowered to aff_tree which is simple arithmetic operation 
>>>>>> usually.  Code handling
>>>>>>      special cases is no longer necessary, which brings us quite 
>>>>>> simplicity.  It is also
>>>>>>      easier to compute consistent costs among different expressions 
>>>>>> using tree affine,
>>>>>>      which gives us a unified cost model.
>>>>>> This patch also fixes issue that cost computation for address type 
>>>>>> iv_use is inconsistent
>>>>>> with how it is re-rewritten in the end.  It greatly simplified cost 
>>>>>> computation.
>>>>>>
>>>>>> Is it OK?
>>>>>
>>>>> The patch is quite hard to follow (diff messes up here -- this is a
>>>>> case where a context
>>>> Hi Richard,
>>>> Thanks for reviewing, attachment is the updated context diff.  It also
>>>> includes a minor fix handling pre-increment addressing mode,
>>>> specifically, it adds two lines of code:
>>>>
>>>>  if (stmt_after_increment (data->current_loop, cand, use->stmt))
>>>>             ainc_offset += ainc_step;
>>>>
>>>>
>>>>> diff is easier to read).  I trust you on the implementation details
>>>>> here, the overall structure
>>>>> looks ok to me.  The only question I have is with regarding to
>>>>> get_loop_invariant_expr
>>>>> which seems to be much less sophisticated than before (basically it's
>>>>> now what was
>>>>> record_inv_expr?).  I suppose the old functionality is superseeded by
>>>>> using affines
>>>>> everywhere else.
>>>> Yes, previous version tries a lot to cancel common sub expression when
>>>> representing use with cand.  New version relies on tree affine which
>>>> is much better.  One problem with invariant expression estimation is
>>>> we don't know in IVOPT if the inv_expr would be hoisted or not by
>>>> later passes.  This problem exists all the time, we can only make
>>>> assumptions here, I think new version is bit more aggressive in
>>>> recording new inv_expr here.
>>>
>>> Thanks.
>>>
>>> LGTM.
>> Trivial update replacing calls to aff_combination_simple_p because of
>> change in previous patch.
>>
>> Thanks,
>> bin
>
> This commit (r247885) causes regressions on aarch64:
>   - PASS now FAIL             [PASS => FAIL]:
>
>   Executed from: gcc.target/aarch64/aarch64.exp
>     gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+.
Hi,
Thanks for reporting.  The cause is a bit complicated for this
failure.  First, IVOPTs chooses [base + 4], rather than [base] address
expression because it has no knowledge that ld1r doesn't support
pre-autoincrement addressing mode; also RTL passes also contributes to
the regression.  I will open a PR for tracking.

Thanks,
bin
>
> Christophe
>
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>>>
>>>>> Otherwise ok.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> bin
>>>>>> 2017-04-11  Bin Cheng  <bin.ch...@arm.com>
>>>>>>
>>>>>>         * tree-ssa-loop-ivopts.c (get_loop_invariant_expr): Simplify.
>>>>>>         (adjust_setup_cost): New parameter supporting round up 
>>>>>> adjustment.
>>>>>>         (struct address_cost_data): Delete.
>>>>>>         (force_expr_to_var_cost): Don't bound cost with spill_cost.
>>>>>>         (split_address_cost, ptr_difference_cost): Delete.
>>>>>>         (difference_cost, compare_aff_trees, record_inv_expr): Delete.
>>>>>>         (struct ainc_cost_data): New struct.
>>>>>>         (get_address_cost_ainc): New function.
>>>>>>         (get_address_cost, get_computation_cost): Reimplement.
>>>>>>         (determine_group_iv_cost_address): Record inv_expr for all uses 
>>>>>> of
>>>>>>         a group.
>>>>>>         (determine_group_iv_cost_cond): Call get_loop_invariant_expr.
>>>>>>         (iv_ca_has_deps): Reimplemented to ...
>>>>>>         (iv_ca_more_deps): ... this.  Check if NEW_CP introduces more 
>>>>>> deps
>>>>>>         than OLD_CP.
>>>>>>         (iv_ca_extend): Call iv_ca_more_deps.

Reply via email to