Re: [patch] Fix PR rtl-optimization/87727

Vladimir Makarov Fri, 21 Dec 2018 07:25:34 -0800



On 12/20/2018 06:14 PM, Peter Bergner wrote:

On 12/20/18 4:41 PM, Jeff Law wrote:

On 12/20/18 2:30 PM, Peter Bergner wrote:

For stage1, I'd like to fix that conflict wart if I can.  I have also
wondered about adding a copy coalesce phase just before we enter RA,
which would ensure the copies are removed, instead of hoping RA assigns
the same reg to the source and destination of the copy making it a nop
that can be removed.

The difficulty with coalescing is that if you get too aggressive then
you end up removing degrees of freedom from the allocator and you can
easily make the final results worse.

I agree, but being too aggressive leading to bad decisions/code is
true for a lot of optimizations. :-)   I do plan on first attacking
the conservative conflict info for pseudos first and seeing what
that buys us before attempting any coalescing.

When I started to work on IRA, I've tried several coalescing techniques(i recall only conservative, iterative and optimistic ones). Theresults were not promising. But it was very long time ago, my majortarget was i686 that time and there were no accurate conflictcalculations for irregular file registers. So may be it will work incurrent environment and in a different implementation.

Currently IRA has coalescing only for spilled pseudos after coloring(because mem<->mem moves are very expensive). LRA has the same technique.

As for removing degrees of freedom for the allocator, sometimes that can
be a good thing, if it can makes the allocator simpler.  For example, I
think we have forced the allocator to do too much by not only being an RA,
but being an instruction selector as well.  Doing both RA and instruction
selection at the same time makes everything very complicated and I think
we probably don't compute allocation costs correctly, since we seem to
calculate costs on a per alternative per insn basis and I don't think we
ever see what the ramifications of using an alternative in one insn
has on the costs of another alternative in another insn.  Sometimes using
the cheapest alternative in one insn and the cheapest alternative in
another insn can lead us into a situation that requires spilling to
resolve the conflicting choices.

I am completely agree. The big remaining part to modernize GCC iscode selection. I believe LLVM has a big advantage in this area overGCC. A modern approach could make RA much simpler. But it is a verybig job involving changes in machine descriptions (a lot of them).

I don't mean machine description in IBURG style. That would be ahuge, enormous job requiring a lot of expertise part of which is lostfor some targets (i was thinking about to start this jobs several timesbut gave up when I saw how many efforts it would take, it would be evena bigger job that writing IRA/LRA).

I am just saying that you need at least have cost for each insnalternative (may be sub-targets). Although some approximation can bepossible (like insns number generated from the alternative or even theirsize).

There are although some smaller projects in this direction. Forexample, I tried to use code selection in register cost calculation (thecode on ira-select branch). The algorithm is based on choosingalternative for each insns first and then calculates costs and registerclasses for pseudos involved in the insn. The chosen alternatives couldbe propagated later to LRA (this work even did not started yet). Thecost of each insn alternative (if we add them in the future in md files)could be easily integrated in the algorithm.

Unfortunately the algorithm did not improve SPEC2006 for x86-64(i7-8700k) in overall although one benchmark was improved by about 5% ifI remember this correctly. But modern Intel CPUs are very insensitiveto optimizations (they are complicated black boxes which do ownoptimizations and anekdotically i saw code when adding an additionalmove sped up the code a lot). May be the algorithm will have betterresults on other targets (power or aarch64). I never tried other targets.

I've wondered if running something like lra_constraints() (but using
pseudos for fixups rather than hard regs) early in the rtl passes as
a pseudo instruction selection pass wouldn't make things easier for
the following passes like RA, etc?

I think it might. As wrote we could propagate the above algorithmdecision to LRA.

Peter, also if you are interesting to do RA work, there is anotherproblem which is to implement sub-register level conflict calculationsin LRA. Currently, IRA has a simple subregister level conflictcalculation (see allocno objects) and in a case of sub-register presenceIRA and LRA decisions are different and this results in worse codegenerations (there are some PRs for this). It would be also a big RAproject to do.

Re: [patch] Fix PR rtl-optimization/87727

Reply via email to