* Li, Liang Z (liang.z...@intel.com) wrote:
> > >> >
> > >> > I use your new code:
> > >> > -------------------------------------------------
> > >> >        unsigned long *p = ...
> > >> >        if (p[0] || p[1] || p[2] || p[3]
> > >> >            || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
> > >> >                return BUFFER_NOT_ZERO;
> > >> >        else
> > >> >                return BUFFER_ZERO;
> > >> > ---------------------------------------------------
> > >> > and the result is almost the same.  I also tried the check 8, 16
> > >> > long data at the beginning, same result.
> > >>
> > >> Interesting...  Well, all I can say is that applaud you for testing
> > >> your hypothesis with the benchmark.
> > >>
> > >> Probably the setup cost of memcmp is too high, because the testing
> > >> loop is already very optimized.
> > >>
> > >> Please submit the AVX2 version if it helps!
> > 
> > I read the email in the wrong order.  Forget about my other email.
> > 
> > Sorry, Juan.
> > 
> 
> One thing I still can't understand, why the unit test in host environment 
> shows
> 'memcmp()' have better performance?

Are you aware of any program other than QEMU that also wants to do something
similar?  Finding whether a block of memory is zero, sounds like something
that would be useful in lots of places, I just can't think which ones.

Dave

> 
> Liang
> > 
> > >
> > > Yes, the AVX2 version really helps. I have already submitted it, could
> > > you help to review it?
> > >
> > > I am curious about the original intention to add the SSE2 Intrinsics,
> > > is the same reason?
> > >
> > > I even suspect the VM may impact the 'memcmp()' performance, is it
> > possible?
> > >
> > > Liang
> > >
> > >> Paolo
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Reply via email to