Il 23/01/2013 17:03, Luigi Rizzo ha scritto: > On Wed, Jan 23, 2013 at 02:03:17PM +0100, Stefan Hajnoczi wrote: >> On Wed, Jan 23, 2013 at 12:50:26PM +0100, Luigi Rizzo wrote: >>> On Wed, Jan 23, 2013 at 12:10:55PM +0100, Stefan Hajnoczi wrote: >>>> On Tue, Jan 22, 2013 at 08:12:15AM +0100, Luigi Rizzo wrote: > ... >>>>> +// a fast copy routine only for multiples of 64 bytes, non overlapped. >>>>> +static inline void >>>>> +pkt_copy(const void *_src, void *_dst, int l) >>> ... >>>>> + *dst++ = *src++; >>>>> + } >>>>> +} >>>> >>>> I wonder how different FreeBSD bcopy() is from glibc memcpy() and if the >>>> optimization is even a win. The glibc code is probably hand-written >>>> assembly that CPU vendors have contributed for specific CPU model >>>> families. >>>> >>>> Did you compare glibc memcpy() against pkt_copy()? >>> >>> I haven't tried in detail on glibc but will run some tests. In any >>> case not all systems have glibc, and on FreeBSD this pkt_copy was >>> a significant win for small packets (saving some 20ns each; of >>> course this counts only when you approach the 10 Mpps range, which >>> is what you get with netmap, and of course when data is in cache). >>> >>> One reason pkt_copy gains something is that if it can assume there >>> is extra space in the buffer, it can work on large chunks avoiding the extra >>> jumps and instructions for the remaining 1-2-4 bytes. >> >> I'd like to drop this code or at least make it FreeBSD-specific since >> there's no guarantee that this is a good idea on any other libc. >> >> I'm even doubtful that it's always a win on FreeBSD. You have a >> threshold to fall back to bcopy() and who knows what the "best" value >> for various CPUs is. > > indeed. > With the attached program (which however might be affected by the > fact that data is not used after copying) it seems that on a recent > linux (using gcc 4.6.2) the fastest is __builtin_memcpy() > > ./testlock -m __builtin_memcpy -l 64 > > (by a factor of 2 or more) whereas all the other methods have > approximately the same speed. > > On FreeBSD (with clang, gcc 4.2.1, gcc 4.6.4) the pkt_copy() above > > ./testlock -m fastcopy -l 64 > > is largely better than other methods. I am a bit puzzled why > the builtin method on FreeBSD is not effective, but i will check > on some other forum...
Perhaps a different default for -march/-mtune? Paolo