RE: [PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-17 Thread David Laight
> On Fri, Jun 17, 2011 at 02:54:00PM +1000, Anton Blanchard wrote: > > Implement a POWER7 optimised memcpy using VMX. For large aligned > > copies this new loop is over 10% faster and for large unaligned > > copies it is over 200% faster. ... > BTW: do you have any statistics on the size distrib

Re: [PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-17 Thread Gabriel Paubert
On Fri, Jun 17, 2011 at 02:54:00PM +1000, Anton Blanchard wrote: > Implement a POWER7 optimised memcpy using VMX. For large aligned > copies this new loop is over 10% faster and for large unaligned > copies it is over 200% faster. > > On POWER7 unaligned stores rarely slow down - they only flush w

Re: [PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-16 Thread Benjamin Herrenschmidt
O > +.Lvmx_copy: > + mflrr0 > + std r4,56(r1) > + std r5,64(r1) > + std r0,16(r1) > + stdur1,-STACKFRAMESIZE(r1) > + bl .enable_kernel_altivec > + ld r0,STACKFRAMESIZE+16(r1) > + ld r3,STACKFRAMESIZE+48(r1) > + ld r4,STACKF

[PATCH 2/3] powerpc: POWER7 optimised memcpy using VMX

2011-06-16 Thread Anton Blanchard
Implement a POWER7 optimised memcpy using VMX. For large aligned copies this new loop is over 10% faster and for large unaligned copies it is over 200% faster. On POWER7 unaligned stores rarely slow down - they only flush when a store crosses a 4KB page boundary. Furthermore this flush is handled