On 27/03/15 17:31, Sandra Loosemore wrote:
On 03/27/2015 03:43 AM, Kyrill Tkachov wrote:
On 27/03/15 03:29, Bin.Cheng wrote:
[much snippage]
As for tree ivopts, address cost is used in both ways. For any
address computation that's invalid, it tries to legitimize it into two
parts, the first part results in alu instructions, the second part
results in address expression of different addressing modes. Right
now the rtx cost (for the first part) and the address cost (for the
second part) are accumulated and compared altogether. I do like the
idea split costs into different types because all elements are
entangled in single cost model, and it's hard to tune for specific
case.
Thanks for explaining.
I think it would be possible to make the comparisons there a bit more sane.
If an address computation is split into alu instructions and a
legitimate address
then carry around the rtx cost of the alu part and the address. When the
time
comes to compare two computations, we create a more involved way of
comparing.
For example (would need benchmarking, of course):
* Compare the rtx costs of the alu components and check the address
preference for
the legitimate address components using the new hook.
* If the alu part costs are of equal rtx cost, pick the one which has
the preferable legitimate address.
* If expression 'a' has a more expensive alu component than 'b' but a more
preferable address component, then use some tie breaker. Perhaps apply
rtx costs
on the address expression and compare those...
Just as an aside here, tree-ssa-loop-ivopts.c has a lot of other
problems with how it's computing address costs, or it least it did when
I last looked at it a few years ago:
https://gcc.gnu.org/ml/gcc-patches/2012-06/msg00319.html
Thanks, that's a useful read.
Shortly after I posted that, Qualcomm lost interest in pushing the
Hexagon port upstream or improving performance, so I lost interest in
pursuing that patch when it was evident that it was going to take a lot
of work to resolve the objections. I think fixes for problems (2) and
(3) there have since been pushed by other people, but I'm not sure about
the others.
FWIW, I think a big part of the problem here is that the GCC internals
documentation isn't very clear on where the cost of legitimizing an
address (the "alu components" above) should be computed. IIRC when I
was last looking at current practice, most targets' implementation of
TARGET_RTX_COSTS didn't make any attempt to account for the address cost
in a MEM -- either adding the cost of legitimizing the address or the
cost of the addressing mode itself (TARGET_ADDRESS_COST).
Currently some targets (aarch64 at least) add the TARGET_ADDRESS_COST to the
cost of a MEM, some (most?) don't, though I'm not sure how much effect that
has had on codegen quality. Do you think it would be a good idea to require
from TARGET_RTX_COST to try to estimate the legitimization cost of an
invalid address in units compatible with rtx cost for the purposes of
ivopts?
If
TARGET_RTX_COSTS is supposed to do that, its documentation should say
so. Or maybe we need a separate hook like
TARGET_LEGITIMIZE_ADDRESS_COST to capture the "alu components" cost.
Can't we just capture the sequence that TARGET_LEGITIMIZE_ADDRESS would
emit and call rtx_cost (or seq_cost) on that?
Thanks,
Kyrill
-Sandra