On Thursday 19 June 2008, Mark Nelson wrote: > .align 7 > _GLOBAL(copy_4K_page) > dcbt 0,r4 /* Prefetch ONE SRC cacheline */ > > addi r6,r3,-8 /* prepare for stdu */ > addi r4,r4,-8 /* prepare for ldu */ > > li r10,32 /* copy 32 cache lines for a 4K page */ > li r12,128+8 /* prefetch distance*/
Since you have a loop here anyway instead of the fully unrolled code, why not provide a copy_64K_page function as well, jumping in here? The inline 64k copy_page function otherwise just adds code size, as well as being a tiny bit slower. It may even be good to have an out-of-line copy_64K_page for the regular code, just calling copy_4K_page repeatedly. Arnd <>< _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev