On 11/13/2013, 5:48 PM, Steven Bosscher wrote:
On Wed, Nov 13, 2013 at 6:56 PM, Vladimir Makarov wrote:
The following patch improves coloring. The order of pushing allocnos on
the coloring stack from a bunch of colorable allocnos was always important
for generated code performance. LRA has a mechanism of allocating pseudos
by threads. Thread in LRA is a set of non-conflicting pseudos connected by
moves (or by future reload insns). Allocating pseudos by threads in LRA
permits to improve code by increasing chance of removing the move insns.
So the same mechanism can be used for IRA. The patch implements it. The
difference is only that LRA forms thread statically before allocation
sub-pass. That is because the basic allocation are already done in IRA.
The statically thread forming works well for IRA too. But even better
results can be got by dynamically forming threads. It means that we are
forming threads during allocation and includes only colorable allocnos.
The results of using threads in IRA is pretty good. The average code size
(text segment) of SPEC2000 is improved (by >0.1% for x86 SPECFP2000 and >
0.3% for x86-64 SPECFP2000). The biggest code performance improvement (> 1%)
is obtained on x86-64 SPECFP2000. Performance tools report that additional
code takes only about 0.05% additionally executed insns.
Nice!
Can you please also update the leading comment in ira.c? It seems
worth mentioning this approach under the "Coloring" bullet
(ira.c:176).
Thanks. I've added a comment. It is not a big change but pretty
important one (e.g. to get about the same improvement 3 years ago I
needed 1K lines to implement coloring for irregular reg file
architecture). So it is worth to document it.
(Not sure if that whole comment block is otherwise up to date??)
It is pretty upto date. I don't see missed important details. Although
it would be nice to describe new pre-RA optimizations shrink-wrapping,
find-moveable, and decrease-live-ranges.
Index: ira.c
===================================================================
--- ira.c (revision 204753)
+++ ira.c (working copy)
@@ -192,7 +192,14 @@ along with GCC; see the file COPYING3.
this point. There is some freedom in the order of putting
allocnos on the stack which can affect the final result of
the allocation. IRA uses some heuristics to improve the
- order.
+ order. The major one is to form *threads* from colorable
+ allocnos and push them on the stack by threads. Thread is a
+ set of non-conflicting colorable allocnos connected by
+ copies. The thread contains allocnos from the colorable
+ bucket or colorable allocnos already pushed onto the coloring
+ stack. Pushing thread allocnos one after another onto the
+ stack increases chances of removing copies when the allocnos
+ get the same hard reg.
We also use a modification of Chaitin-Briggs algorithm which
works for intersected register classes of allocnos. To