Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))

Luigi Rizzo Thu, 24 Jan 2013 09:36:34 -0800

On Thu, Jan 24, 2013 at 09:54:19AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jan 23, 2013 at 06:55:59PM -0800, Luigi Rizzo wrote:
> > On Wed, Jan 23, 2013 at 8:03 AM, Luigi Rizzo <[email protected]> wrote:
> > 
> > > > I'm even doubtful that it's always a win on FreeBSD.  You have a
> > > > threshold to fall back to bcopy() and who knows what the "best" value
> > > > for various CPUs is.
> > >
> > > indeed.
> > > With the attached program (which however might be affected by the
> > > fact that data is not used after copying) it seems that on a recent
> > > linux (using gcc 4.6.2) the fastest is __builtin_memcpy()
> > >
> > >         ./testlock -m __builtin_memcpy -l 64
> > >
> > > (by a factor of 2 or more) whereas all the other methods have
> > > approximately the same speed.
> > >
> > 
> > never mind, pilot error. in my test program i had swapped the
> > arguments to __builtin_memcpy(). With the correct ones,
> > __builtin_memcpy()  == bcopy == memcpy on both machines,
> > and never faster than the pkt_copy().
> 
> Are the bcopy()/memcpy() calls given a length that is a multiple of 64 bytes?
> 
> IIUC pkt_copy() assumes 64-byte multiple lengths and that optimization
> can matches with memcpy(dst, src, (len + 63) & ~63).  Maybe it helps and
> at least ensures they are doing equal amounts of byte copying.


the length is a parameter from the command line.
For short packets, at least on the i7-2600 and freebsd the pkt_copy()
is only slightly faster than memcpy on multiples of 64, and *a lot*
faster when the length is not a multiple.
Again i am not sure whether it depends on the compiler/glibc or
simply on the CPU, unfortunately i have no way to swap machines.

luigi

Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))

Reply via email to