On 10/10/14 09:02, Vladimir Makarov wrote:
The new LRA rematerialization sub-pass works right before spilling
subpass and tries to rematerialize values of spilled pseudos. To
implement the new sub-pass, some important changes in LRA were done.
First, lra-lives.c updates live info about all registers (not only
allocatable ones) and, second, lra-constraints.c was modified to
permit to check that insn satisfies all constraints in strict sense
even if it still contains pseudos.
I've tested and benchmarked the sub-pass on x86-64 and ARM. The
sub-pass permits to generate a smaller code in average on both
architecture (although improvement no-significant), adds < 0.4%
additional compilation time in -O2 mode of release GCC (according user
time of compilation of 500K lines fortran program and valgrind lakey #
insns in combine.i compilation) and about 0.7% in -O0 mode. As the
performance result, the best I found is 1% SPECFP2000 improvement on
ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
results are practically the same (Haswell has a very good
sophisticated memory sub-system).
There is a room for the pass improvements. I wrote some ideas at the
top level comment of file lra-remat.c
Rematerialization sub-pass will work at -O2 and higher and new option
-flra-remat is introduced.
The patch was successfully tested on x86-64 and ARM. I am going to
submit it on next week. Any comments are appreciated.
I wonder if this could help with some of the rematerialization issues
Intel is running into with their allocatable PIC register changes for i686.
I'll let them pass along test cases you can play with :-)
jeff