Hi Arnd,
> Would it help to also add a way for an architecture to override
> memcmp_pages() with its own implementation? That way you could
> skip the unaligned part, hardcode the loop counter and avoid the
> preempt_disable() in kmap_atomic().
Good idea. We could also have a generic implementati
On Wednesday 21 January 2015 12:27:38 Anton Blanchard wrote:
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
>
> Optimise the loop in a few ways:
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte) comp
From: Joakim Tjernlund
> On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
> > Hi David,
> >
> > > The unrolled loop (deleted) looks excessive.
> > > On a modern cpu with multiple execution units you can usually
> > > manage to get the loop overhead to execute in parallel to the
> > > actu
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
> Hi David,
>
> > The unrolled loop (deleted) looks excessive.
> > On a modern cpu with multiple execution units you can usually
> > manage to get the loop overhead to execute in parallel to the
> > actual 'work'.
> > So I suspect that a m
Hi David,
> The unrolled loop (deleted) looks excessive.
> On a modern cpu with multiple execution units you can usually
> manage to get the loop overhead to execute in parallel to the
> actual 'work'.
> So I suspect that a much simpler 'word at a time' loop will be
> almost as fast - especially i
On 08-01-2015 23:56, Anton Blanchard wrote:
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
>
> Optimise the loop in a few ways:
>
> - Unroll the
From: Anton Blanchard
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
>
> Optimise the loop in a few ways:
>
> - Unroll the byte at a time loop
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte) comp