On Thu,  4 Aug 2016 16:53:22 +1000
Anton Blanchard <an...@ozlabs.org> wrote:

> From: Anton Blanchard <an...@samba.org>
> 
> Align the hot loops in our assembly implementation of memset()
> and backwards_memcpy().
> 
> backwards_memcpy() is called from tcp_v4_rcv(), so we might
> want to optimise this a little more.
> 
> Signed-off-by: Anton Blanchard <an...@samba.org>
> ---
>  arch/powerpc/lib/mem_64.S | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S
> index 43435c6..eda7a96 100644
> --- a/arch/powerpc/lib/mem_64.S
> +++ b/arch/powerpc/lib/mem_64.S
> @@ -37,6 +37,7 @@ _GLOBAL(memset)
>       clrldi  r5,r5,58
>       mtctr   r0
>       beq     5f
> +     .balign 16
>  4:   std     r4,0(r6)
>       std     r4,8(r6)
>       std     r4,16(r6)

Hmm. If we execute this loop once, we'll only fetch additional nops. Twice, and
we make up for them by not fetching unused instructions. More than twice and we
may start winning.

For large sizes it probably helps, but I'd like to see what sizes memset sees.

Reply via email to