On 10/30/13 14:46, Richard Biener wrote:
On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng<bin.ch...@arm.com> wrote:
Hi,
I noticed that IVOPT generates complex address expressions like below for iv
base.
&arr_base[0].y
&arr[0]
&MEM[p+o]
It's even worse for targets support auto-increment addressing mode because
IVOPT adjusts such base expression with +/- step, then creates below:
&arr_base[0].y +/- step
&arr[0] +/- step
&MEM[p+o] +/- step
It has two disadvantages:
1) Cost computation in IVOPT can't handle complex address expression and
general returns spill_cost for it, which is bad since address iv is
important to IVOPT.
2) IVOPT creates duplicate candidates for IVs which have same value in
different forms, for example, two candidates are generated with each for
"&a[0]" and "&a". Again, it's even worse for auto-increment addressing
mode.
This patch fixes the issue by simplifying address expression at the entry of
allocating IV struct. Maybe the simplification can be put in various fold*
functions but I think it might be better in this way, because:
1) fold* functions are used from front-end to various tree optimizations,
the simplified address expressions may not be what each optimizer wanted.
Think about parallelism related passes, they might want the array index
information kept for further analysis.
2) In some way, the simplification is conflict with current implementation
of fold* function. Take fold_binary_loc as an example, it tries to simplify
"&a[i1] +p c* i2" into "&a[i1+i2]". Of course we can simplify in this way
for IVOPT too, but that will cause new problems like: a) we have to add code
in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
to avoid; b) the simplification can't always be done because of the
sign/unsigned offset problem (especially for auto-increment addressing
mode).
3) There are many entry point for fold* functions, the change will be
non-trivial.
4) The simplification is only done in alloc_iv for true (not duplicate ones)
iv struct, the number of such iv should be moderate.
With these points, I think it might be a win to do the simplification in
IVOPT and create a kind of sand box to let IVOPT play. Any suggestions?
Bootstrap and tested on x86/x86_64/arm.
The patch causes three cases failed on some target, but all of them are
false alarm, which can be resolved by refining test cases to check more
accurate information.
Is it OK?
Hmm. I think you want what get_inner_reference_aff does, using
the return value of get_inner_reference as starting point for
determine_base_object. And you don't want to restrict yourselves
so much on what exprs to process, but only exclude DECL_Ps.
Just amend get_inner_reference_aff to return the tree base object.
Or, update determine_base_object to handle MEM_REF, ARRAY_REF,
COMPONENT_REF, etc. by calling get_inner_reference to get the base and
continuing the recursive determine_base_object on the return value
(TREE_OPERAND (base, 0)).
Calling an amended get_inner_reference_aff can be expensive, as the
function will also spend time in transforming the reference from tree to
aff_tree.
Yufeng