> On Fri, Jun 17, 2011 at 02:54:00PM +1000, Anton Blanchard wrote:
> > Implement a POWER7 optimised memcpy using VMX. For large aligned
> > copies this new loop is over 10% faster and for large unaligned
> > copies it is over 200% faster.
...
> BTW: do you have any statistics on the size distrib
On Fri, Jun 17, 2011 at 02:54:00PM +1000, Anton Blanchard wrote:
> Implement a POWER7 optimised memcpy using VMX. For large aligned
> copies this new loop is over 10% faster and for large unaligned
> copies it is over 200% faster.
>
> On POWER7 unaligned stores rarely slow down - they only flush w
O
> +.Lvmx_copy:
> + mflrr0
> + std r4,56(r1)
> + std r5,64(r1)
> + std r0,16(r1)
> + stdur1,-STACKFRAMESIZE(r1)
> + bl .enable_kernel_altivec
> + ld r0,STACKFRAMESIZE+16(r1)
> + ld r3,STACKFRAMESIZE+48(r1)
> + ld r4,STACKF
Implement a POWER7 optimised memcpy using VMX. For large aligned
copies this new loop is over 10% faster and for large unaligned
copies it is over 200% faster.
On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled