On Wed, Jan 23, 2013 at 8:03 AM, Luigi Rizzo <ri...@iet.unipi.it> wrote:

> > I'm even doubtful that it's always a win on FreeBSD.  You have a
> > threshold to fall back to bcopy() and who knows what the "best" value
> > for various CPUs is.
> indeed.
> With the attached program (which however might be affected by the
> fact that data is not used after copying) it seems that on a recent
> linux (using gcc 4.6.2) the fastest is __builtin_memcpy()
>         ./testlock -m __builtin_memcpy -l 64
> (by a factor of 2 or more) whereas all the other methods have
> approximately the same speed.

never mind, pilot error. in my test program i had swapped the
arguments to __builtin_memcpy(). With the correct ones,
__builtin_memcpy()  == bcopy == memcpy on both machines,
and never faster than the pkt_copy().

In fact, on the machine with FreeBSD the unrolled loop
still beats all other methods at small packet sizes.

(e.g. (memcin my test program I had swapped the
source and destination operands for __builtin_memcpy(), and
this substantially changed the memory access pattern.

With the correct operands, __builtin_memcpy == memcpy == bcopy
on both FreeBSD and Linux.
On FreeBSD pkt_copy is still faster than the other methods for
small packets, whereas on Linux they are equivalent.

If you are curious why swapping source and dst changed things
so dramatically:

the test was supposed to read from a large chunk of
memory (over 1GB) to avoid always hitting L1 or L2.
Swapping operands causes reads to hit always the same line,
thus saving a lot of misses. The difference between the two
machine then probably is due to how the cache is used on writes.


 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)

Reply via email to