On Saturday 21 June 2008, Paul Mackerras wrote: > Is this application really transferring bulk data and using buffers > that aren't a multiple of the page size? Do you know whether the > copies ended up being misaligned?
In the problem case that was reported to me, it was all bulk data, and all the oprofile samples showed up in the unaligned code path of the usercopy code, which does the microcoded (on cell) shift operations. > Of course, if we really want the fastest copy possible, the thing to > do is to use VMX loads and stores on 970, POWER6 and Cell. The > overhead of setting up to use VMX in the kernel would probably kill > any advantage, though -- at least, that's what I found when I tried > using VMX for copy_page in the kernel on 970 a few years ago. Right, that is understandable, we saw similar results when Sebastian was working on VMX optimized AES code. > Let's see what Mark comes up with. We may be able to find a way to do > it that works well across all current CPUs and also is OK for small > copies. If not we might need to do what you suggest. ok. Arnd <>< _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev