performance regression due to poor register allocation on Cortex-M0

vmakarov at gcc dot gnu.org Wed, 23 Mar 2016 10:17:01 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164


--- Comment #10 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #9)
> I think that's a fair characterization.  The extra copy emitted by the older
> compiler gives the allocator more freedom.   With coalescing getting more
> aggressive, the copy is gone and the allocator's freedom is reduced.
> 
> I'll try to have a look at what the allocator is doing, but I doubt it's
> realistically something that can be addressed in this release cycle.

I am agree.  It will be probably hard to fix in IRA on this stage.

Coalescing is a controversial thing.  Therefore there are so many coalescing
algorithms.  I've tried a lot of them when I worked on global RA.  Finally, I
found that implicit coalescing worked the best.  The word `implicit` means that
we propagate hard register preferences (through copies, including implicit ones
for two-operand constraints) from already assigned pseudos to unassigned ones. 
When it is possible to assign the same hard register, we do it and remove the
copies. Otherwise, we still can assign a hard register which might be
impossible after we explicitly coalesced two pseudos.

Only LRA does explicit coalescing for pseudos assigned to memory as we have no
constraints on # stack slots and memory-memory moves are expensive and require
additional hard reg.

I guess probably this sort of PR could be fixed if we had live-range splitting
in any place not only on the loop borders.  But it might create other PRs if it
makes a wrong decisions :)  Unfortunately, it is all about heuristics.  They
can work successfully in one case and do bad things in another case.  The
performance of credible benchmarks should be a criterion.

[Bug rtl-optimization/70164] [6 Regression] Code/performance regression due to poor register allocation on Cortex-M0

Reply via email to