Experiments with the netperf benchmark indicated that the size selecting VMX-based copies in __copy_tofrom_user_power7() was suboptimal on POWER8. Measurements showed that parity was in the neighbourhood of 3328 bytes, rather than greater than 4096. The change gives a 1.5-2.0% improvement in performance for 4096-byte buffers, reducing the relative time spent in __copy_tofrom_user_power7() from approximately 7% to approximately 5% in the TCP_RR benchmark.
Signed-off-by: Andrew Jeffery <and...@aj.id.au> --- arch/powerpc/lib/copyuser_power7.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/lib/copyuser_power7.S b/arch/powerpc/lib/copyuser_power7.S index a24b4039352c..706b7cc19846 100644 --- a/arch/powerpc/lib/copyuser_power7.S +++ b/arch/powerpc/lib/copyuser_power7.S @@ -82,14 +82,14 @@ _GLOBAL(__copy_tofrom_user_power7) #ifdef CONFIG_ALTIVEC cmpldi r5,16 - cmpldi cr1,r5,4096 + cmpldi cr1,r5,3328 std r3,-STACKFRAMESIZE+STK_REG(R31)(r1) std r4,-STACKFRAMESIZE+STK_REG(R30)(r1) std r5,-STACKFRAMESIZE+STK_REG(R29)(r1) blt .Lshort_copy - bgt cr1,.Lvmx_copy + bge cr1,.Lvmx_copy #else cmpldi r5,16 -- 2.9.3