On 2014-10-14 12:01 PM, Wilco Dijkstra wrote:
Vladimir Makarov wrote:
On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
SPECFP is ~0.2% faster.
Thanks for reporting this.  It is important for me as I have no aarch64
machine for benchmarking.

Perlbmk performance degradation is too big and I'll definitely look at
this problem.

Looking at the diffs in regexec.c which has the hot function regmatch(),
nothing obvious stands out that could cause a serious regression.
I did notice this around line 2300:

.L802:
         ldr     x1, [x23, 48]
         adrp    x5, PL_savestack_ix
         ldr     w0, [x23]
         str     x5, [sp, 104]
         str     x1, [x24, #:lo12:PL_regcc]
         ldr     w27, [x1, 4]
         bl      regcppush
-       ldr     x5, [sp, 104]
         str     w0, [sp, 112]
         ldr     x0, [x23, 32]
+       adrp    x5, PL_savestack_ix
         ldr     w28, [x5, #:lo12:PL_savestack_ix]
+       str     x5, [sp, 104]
         bl      regmatch
         ldr     x5, [sp, 104]
         mov     w19, w0
         ldr     w1, [sp, 112]
         ldr     w0, [x5, #:lo12:PL_savestack_ix]

So it rematerializes once instance, but fails to rematerialize the second use.
An extra store is inserted, and the first adrp and store are not removed as 
dead.


Thanks for the analysis. Dead store elimination would help rematerialization. LRA can not update global life info as it does not use DF-infrastracture for compile speed reasons. However LRA does local life info analysis (or in EBBs). So in some simple cases we can implement removing dead stores (at this stage it is still pseudo instead of memory). I'll think what can I do here.


Reply via email to