On Thu, 4 Aug 2016 16:53:22 +1000 Anton Blanchard <an...@ozlabs.org> wrote:
> From: Anton Blanchard <an...@samba.org> > > Align the hot loops in our assembly implementation of memset() > and backwards_memcpy(). > > backwards_memcpy() is called from tcp_v4_rcv(), so we might > want to optimise this a little more. > > Signed-off-by: Anton Blanchard <an...@samba.org> > --- > arch/powerpc/lib/mem_64.S | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S > index 43435c6..eda7a96 100644 > --- a/arch/powerpc/lib/mem_64.S > +++ b/arch/powerpc/lib/mem_64.S > @@ -37,6 +37,7 @@ _GLOBAL(memset) > clrldi r5,r5,58 > mtctr r0 > beq 5f > + .balign 16 > 4: std r4,0(r6) > std r4,8(r6) > std r4,16(r6) Hmm. If we execute this loop once, we'll only fetch additional nops. Twice, and we make up for them by not fetching unused instructions. More than twice and we may start winning. For large sizes it probably helps, but I'd like to see what sizes memset sees.