On 12/20/2018 06:14 PM, Peter Bergner wrote:
On 12/20/18 4:41 PM, Jeff Law wrote:
On 12/20/18 2:30 PM, Peter Bergner wrote:
For stage1, I'd like to fix that conflict wart if I can. I have also
wondered about adding a copy coalesce phase just before we enter RA,
which would ensure the copies are removed, instead of hoping RA assigns
the same reg to the source and destination of the copy making it a nop
that can be removed.
The difficulty with coalescing is that if you get too aggressive then
you end up removing degrees of freedom from the allocator and you can
easily make the final results worse.
I agree, but being too aggressive leading to bad decisions/code is
true for a lot of optimizations. :-) I do plan on first attacking
the conservative conflict info for pseudos first and seeing what
that buys us before attempting any coalescing.
When I started to work on IRA, I've tried several coalescing techniques
(i recall only conservative, iterative and optimistic ones). The
results were not promising. But it was very long time ago, my major
target was i686 that time and there were no accurate conflict
calculations for irregular file registers. So may be it will work in
current environment and in a different implementation.
Currently IRA has coalescing only for spilled pseudos after coloring
(because mem<->mem moves are very expensive). LRA has the same technique.
As for removing degrees of freedom for the allocator, sometimes that can
be a good thing, if it can makes the allocator simpler. For example, I
think we have forced the allocator to do too much by not only being an RA,
but being an instruction selector as well. Doing both RA and instruction
selection at the same time makes everything very complicated and I think
we probably don't compute allocation costs correctly, since we seem to
calculate costs on a per alternative per insn basis and I don't think we
ever see what the ramifications of using an alternative in one insn
has on the costs of another alternative in another insn. Sometimes using
the cheapest alternative in one insn and the cheapest alternative in
another insn can lead us into a situation that requires spilling to
resolve the conflicting choices.
I am completely agree. The big remaining part to modernize GCC is
code selection. I believe LLVM has a big advantage in this area over
GCC. A modern approach could make RA much simpler. But it is a very
big job involving changes in machine descriptions (a lot of them).
I don't mean machine description in IBURG style. That would be a
huge, enormous job requiring a lot of expertise part of which is lost
for some targets (i was thinking about to start this jobs several times
but gave up when I saw how many efforts it would take, it would be even
a bigger job that writing IRA/LRA).
I am just saying that you need at least have cost for each insn
alternative (may be sub-targets). Although some approximation can be
possible (like insns number generated from the alternative or even their
size).
There are although some smaller projects in this direction. For
example, I tried to use code selection in register cost calculation (the
code on ira-select branch). The algorithm is based on choosing
alternative for each insns first and then calculates costs and register
classes for pseudos involved in the insn. The chosen alternatives could
be propagated later to LRA (this work even did not started yet). The
cost of each insn alternative (if we add them in the future in md files)
could be easily integrated in the algorithm.
Unfortunately the algorithm did not improve SPEC2006 for x86-64
(i7-8700k) in overall although one benchmark was improved by about 5% if
I remember this correctly. But modern Intel CPUs are very insensitive
to optimizations (they are complicated black boxes which do own
optimizations and anekdotically i saw code when adding an additional
move sped up the code a lot). May be the algorithm will have better
results on other targets (power or aarch64). I never tried other targets.
I've wondered if running something like lra_constraints() (but using
pseudos for fixups rather than hard regs) early in the rtl passes as
a pseudo instruction selection pass wouldn't make things easier for
the following passes like RA, etc?
I think it might. As wrote we could propagate the above algorithm
decision to LRA.
Peter, also if you are interesting to do RA work, there is another
problem which is to implement sub-register level conflict calculations
in LRA. Currently, IRA has a simple subregister level conflict
calculation (see allocno objects) and in a case of sub-register presence
IRA and LRA decisions are different and this results in worse code
generations (there are some PRs for this). It would be also a big RA
project to do.