Re: Optimised memset64/memset32 for powerpc

Matthew Wilcox Tue, 21 Mar 2017 06:29:53 -0700

On Tue, Mar 21, 2017 at 01:23:36PM +0100, Christophe LEROY wrote:
> > It doesn't look free for you as you only store one register each time
> > around the loop in the 32-bit memset implementation:
> > 
> > 1:      stwu    r4,4(r6)
> >         bdnz    1b
> > 
> > (wouldn't you get better performance on 32-bit powerpc by unrolling that
> > loop like you do on 64-bit?)
> 
> In arch/powerpc/lib/copy_32.S, the implementation of memset() is optimised
> when the value to be set is zero. It makes use of the 'dcbz' instruction
> which zeroizes a complete cache line.
> 
> Not much effort has been put on optimising non-zero memset() because there
> are almost none.


Yes, bzero() is much more common than setting an 8-bit pattern.
And setting an 8-bit pattern is almost certainly more common than setting
a 32 or 64 bit pattern.

> Unrolling the loop could help a bit on old powerpc32s that don't have branch
> units, but on those processors the main driver is the time spent to do the
> effective write to memory, and the operations necessary to unroll the loop
> are not worth the cycle added by the branch.
> 
> On more modern powerpc32s, the branch unit implies that branches have a zero
> cost.

Fair enough.  I'm just surprised it was worth unrolling the loop on
powerpc64 and not on powerpc32 -- see mem_64.S.

> A simple static inline C function would probably do the job, based on what I
> get below:
> 
> void memset32(int *p, int v, unsigned int c)
> {
>       int i;
> 
>       for (i = 0; i < c; i++)
>               *p++ = v;
> }
> 
> void memset64(long long *p, long long v, unsigned int c)
> {
>       int i;
> 
>       for (i = 0; i < c; i++)
>               *p++ = v;
> }

Well, those are the generic versions in the first patch:

http://git.infradead.org/users/willy/linux-dax.git/commitdiff/538b9776ac925199969bd5af4e994da776d461e7

so if those are good enough for you guys, there's no need for you to
do anything.

Thanks for your time!

Re: Optimised memset64/memset32 for powerpc

Reply via email to