On 10/13/2014 12:24 PM, Wilco Dijkstra wrote: >> Here is a new rematerialization sub-pass of LRA. >> >> I've tested and benchmarked the sub-pass on x86-64 and ARM. The >> sub-pass permits to generate a smaller code in average on both >> architecture (although improvement no-significant), adds < 0.4% >> additional compilation time in -O2 mode of release GCC (according user >> time of compilation of 500K lines fortran program and valgrind lakey # >> insns in combine.i compilation) and about 0.7% in -O0 mode. As the >> performance result, the best I found is 1% SPECFP2000 improvement on >> ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance >> results are practically the same (Haswell has a very good >> sophisticated memory sub-system). > I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves > -mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is > AArch64 specific, but previously non-optimal register allocation choices > triggered > A latent bug in ree (it's unclear why GCC still allocates FP registers in > high-pressure integer code, as I set the costs for int<->FP moves high). > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and > SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking.
Perlbmk performance degradation is too big and I'll definitely look at this problem. > Generally I think it is good to have a specific pass for rematerialization. > However should this not also affect the costs of instructions that can be > cheaply rematerialized? Similarly for the choice whether to caller save or > spill > (today the caller-save code doesn't care at all about rematerialization, so > it > aggressively caller-saves values which could be rematerialized - see eg. > https://gcc.gnu.org/ml/gcc/2014-09/msg00071.html). I wanted to address the cost issues later but I guess perlbmk performance problem might be solved by this. So I'll be starting working on this. The rematerialization pass can fix caller-saves code if we add processing move insns too. So it could be another project to improve the rematerialization. Thanks for pointing this out. > > Also I am confused by the claim "memory reads are not profitable to > rematerialize". > Surely rematerializing a memory read from const-data or literal pool is > cheaper > than spilling as you avoid a store to the stack? > Most such cases are covered by cfg-insensitive rematerialization but I guess there are cfg-sensitve cases. I should try this too. Wilco, thanks for very informative email with three ideas to improve the rematerialization. As I wrote the patch is an initial implementation of the rematerialization and the infrastructure with modifications will be able to handle these and other improvements. Most important we have the infrastructure in the right place now,