Hi Paul, Of course, I can only speak for the test result that I got on our platforms. We did test on PS3, QS21 single/dual, QS22 single/dual, and JS21
The performance of the old Linux routine and the new routine is about the same for copies of less than 128 Bytes. At 512 byte the new routine is about 100% faster than the old one. (on QS 21) At 1500 Byte size, which is a typical ethernet frame size, the new routine is over 3 times faster than the old one. (on QS21) We could NOT see a performance decrease for small copies. We saw that for copies of 512 byte and more the performance increase is significant. >However, it's very rare to transfer large amounts of data over >loopback, unless you're running a benchmark like iperf or netperf. Please mind that this test was done as its a simple way to show how much less work the CPU needs to do to handle network traffic. All network traffic goes to copy2user - all network traffic can now be done with much less CPU power wasted for copying the data. Don't you agree that network traffic or IO in general with packages over 500 Byte, is not a rare case? Cheers Gunnar Paul Mackerras <[EMAIL PROTECTED] > To Gunnar von 20/06/2008 03:13 Boehn/Germany/Contr/[EMAIL PROTECTED] cc Arnd Bergmann <[EMAIL PROTECTED]>, linuxppc-dev@ozlabs.org, Michael Ellerman <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject Re: [Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell Gunnar von Boehn writes: > The "regular" code was much slower for the normal case and has a special > version for the 4K optimized case. That's a slightly inaccurate view... The reason for having the two cases is that when I profiled the distribution of sizes and alignments of memory copies in the kernel, the result was that almost all copies (something like 99%, IIRC) were either 128 bytes or less, or else a whole page at a page-aligned address. Thus we get the best performance by having a simple copy routine with minimal setup overhead for the small copy case, plus an aggressively optimized page copy routine. Spending time setting up for a multi-cacheline copy that's not a whole page is just going to hurt the small copy case without providing any real benefit. Transferring data over loopback is possibly an exception to that. However, it's very rare to transfer large amounts of data over loopback, unless you're running a benchmark like iperf or netperf. :-/ Paul. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev