On Jun 4, 2012, at 8:12 AM, Olof Johansson wrote: > Hi, > > On Mon, Jun 4, 2012 at 12:58 AM, Anton Blanchard <an...@samba.org> wrote: >> >> I blame Mikey for this. He elevated my slightly dubious testcase: >> >> # dd if=/dev/zero of=/dev/null bs=1M count=10000 >> >> to benchmark status. And naturally we need to be number 1 at creating >> zeros. So lets improve __clear_user some more. >> >> As Paul suggests we can use dcbz for large lengths. This patch gets >> the destination 128 byte aligned then uses dcbz on whole cachelines. >> >> Before: >> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s >> >> After: >> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s >> >> 39 GB/s, a new record. >> >> Signed-off-by: Anton Blanchard <an...@samba.org> >> --- >> >> Index: linux-build/arch/powerpc/lib/string_64.S >> =================================================================== >> --- linux-build.orig/arch/powerpc/lib/string_64.S 2012-06-04 >> 16:18:56.351604302 +1000 >> +++ linux-build/arch/powerpc/lib/string_64.S 2012-06-04 >> 16:47:10.538500871 +1000 >> @@ -78,7 +78,7 @@ _GLOBAL(__clear_user) > [..] > >> +15: >> +err2; dcbz r0,r3 >> + addi r3,r3,128 >> + addi r4,r4,-128 >> + bdnz 15b > > This breaks architecture spec (and at least one implementation); cache > lines are not guaranteed to be 128 bytes.
I'm guessing it breaks more than one (FSL 64-bit is 64byte cache lines). - k _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev