Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-21 Thread Anton Blanchard
Hi Arnd, > Would it help to also add a way for an architecture to override > memcmp_pages() with its own implementation? That way you could > skip the unaligned part, hardcode the loop counter and avoid the > preempt_disable() in kmap_atomic(). Good idea. We could also have a generic implementati

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-21 Thread Arnd Bergmann
On Wednesday 21 January 2015 12:27:38 Anton Blanchard wrote: > I noticed ksm spending quite a lot of time in memcmp on a large > KVM box. The current memcmp loop is very unoptimised - byte at a > time compares with no loop unrolling. We can do much much better. > > Optimise the loop in a few ways:

[PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-20 Thread Anton Blanchard
I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte) comp

RE: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-12 Thread David Laight
From: Joakim Tjernlund > On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote: > > Hi David, > > > > > The unrolled loop (deleted) looks excessive. > > > On a modern cpu with multiple execution units you can usually > > > manage to get the loop overhead to execute in parallel to the > > > actu

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-11 Thread Joakim Tjernlund
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote: > Hi David, > > > The unrolled loop (deleted) looks excessive. > > On a modern cpu with multiple execution units you can usually > > manage to get the loop overhead to execute in parallel to the > > actual 'work'. > > So I suspect that a m

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-11 Thread Anton Blanchard
Hi David, > The unrolled loop (deleted) looks excessive. > On a modern cpu with multiple execution units you can usually > manage to get the loop overhead to execute in parallel to the > actual 'work'. > So I suspect that a much simpler 'word at a time' loop will be > almost as fast - especially i

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-09 Thread Adhemerval Zanella
On 08-01-2015 23:56, Anton Blanchard wrote: > I noticed ksm spending quite a lot of time in memcmp on a large > KVM box. The current memcmp loop is very unoptimised - byte at a > time compares with no loop unrolling. We can do much much better. > > Optimise the loop in a few ways: > > - Unroll the

RE: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-09 Thread David Laight
From: Anton Blanchard > I noticed ksm spending quite a lot of time in memcmp on a large > KVM box. The current memcmp loop is very unoptimised - byte at a > time compares with no loop unrolling. We can do much much better. > > Optimise the loop in a few ways: > > - Unroll the byte at a time loop

[PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-08 Thread Anton Blanchard
I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte) comp