On 10/13/2014 12:24 PM, Wilco Dijkstra wrote:
>>   Here is a new rematerialization sub-pass of LRA.
>>
>>   I've tested and benchmarked the sub-pass on x86-64 and ARM.  The
>> sub-pass permits to generate a smaller code in average on both
>> architecture (although improvement no-significant), adds < 0.4%
>> additional compilation time in -O2 mode of release GCC (according user
>> time of compilation of 500K lines fortran program and valgrind lakey #
>> insns in combine.i compilation) and about 0.7% in -O0 mode.  As the
>> performance result, the best I found is 1% SPECFP2000 improvement on
>> ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance
>> results are practically the same (Haswell has a very good
>> sophisticated memory sub-system).
> I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves
> -mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is
> AArch64 specific, but previously non-optimal register allocation choices 
> triggered
> A latent bug in ree (it's unclear why GCC still allocates FP registers in 
> high-pressure integer code, as I set the costs for int<->FP moves high).
>
> On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and 
> SPECFP is ~0.2% faster.
Thanks for reporting this.  It is important for me as I have no aarch64
machine for benchmarking.

Perlbmk performance degradation is too big and I'll definitely look at
this problem.

> Generally I think it is good to have a specific pass for rematerialization.
> However should this not also affect the costs of instructions that can be 
> cheaply rematerialized? Similarly for the choice whether to caller save or 
> spill 
> (today the caller-save code doesn't care at all about rematerialization, so 
> it 
> aggressively caller-saves values which could be rematerialized - see eg. 
> https://gcc.gnu.org/ml/gcc/2014-09/msg00071.html).
I wanted to address the cost issues later but I guess perlbmk
performance problem might be solved by this.  So I'll be starting
working on this.

The rematerialization pass can fix caller-saves code if we add
processing move insns too.  So it could be another project to improve
the rematerialization.  Thanks for pointing this out.
 
>
> Also I am confused by the claim "memory reads are not profitable to 
> rematerialize". 
> Surely rematerializing a memory read from const-data or literal pool is 
> cheaper
> than spilling as you avoid a store to the stack?
>
Most such cases are covered by cfg-insensitive rematerialization but I
guess there are cfg-sensitve cases.  I should try this too.

Wilco, thanks for very informative email with three ideas to improve the
rematerialization.  As I wrote the patch is an initial implementation of
the rematerialization and the infrastructure with modifications will be
able to handle these and other improvements.  Most important we have the
infrastructure in the right place now,


Reply via email to