On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote: > Probably our compiler folks have some ideas on this... > > When doing netmap i found that on FreeBSD memcpy/bcopy was expensive, > __builtin_memcpy() was even worse, and so i ended up writing > my custom routine, (called pkt_copy() in the program below). > This happens with gcc 4.2.1, clang, gcc 4.6.4 > > I was then surprised to notice that on a recent ubuntu using > gcc 4.6.2 (if that matters) the __builtin_memcpy beats other > methods by a large factor.
so, it turns out that in my test program I had swapped the source and destination operands for __builtin_memcpy(), and this substantially changed the memory access pattern. With the correct operands, __builtin_memcpy == memcpy == bcopy on both FreeBSD and Linux. On FreeBSD pkt_copy is still faster than the other methods for small packets, whereas on Linux they are equivalent. If you are curious why swapping source and dst changed things so dramatically: the test was supposed to read from a large chunk of memory (over 1GB) to avoid always hitting L1 or L2. Swapping operands causes reads to hit always the same line, thus saving a lot of misses. The difference between the two machine then probably is due to how the cache is used on writes. sorry for the noise. luigi _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[email protected]"
