On 2014-10-14 12:01 PM, Wilco Dijkstra wrote:
Vladimir Makarov wrote:
On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
SPECFP is ~0.2% faster.
Thanks for reporting this. It is important for me as I have no aarch64
machine for benchmarking.
Perlbmk performance degradation is too big and I'll definitely look at
this problem.
Looking at the diffs in regexec.c which has the hot function regmatch(),
nothing obvious stands out that could cause a serious regression.
I did notice this around line 2300:
.L802:
ldr x1, [x23, 48]
adrp x5, PL_savestack_ix
ldr w0, [x23]
str x5, [sp, 104]
str x1, [x24, #:lo12:PL_regcc]
ldr w27, [x1, 4]
bl regcppush
- ldr x5, [sp, 104]
str w0, [sp, 112]
ldr x0, [x23, 32]
+ adrp x5, PL_savestack_ix
ldr w28, [x5, #:lo12:PL_savestack_ix]
+ str x5, [sp, 104]
bl regmatch
ldr x5, [sp, 104]
mov w19, w0
ldr w1, [sp, 112]
ldr w0, [x5, #:lo12:PL_savestack_ix]
So it rematerializes once instance, but fails to rematerialize the second use.
An extra store is inserted, and the first adrp and store are not removed as
dead.
Thanks for the analysis. Dead store elimination would help
rematerialization. LRA can not update global life info as it does not
use DF-infrastracture for compile speed reasons. However LRA does local
life info analysis (or in EBBs). So in some simple cases we can
implement removing dead stores (at this stage it is still pseudo instead
of memory). I'll think what can I do here.