* Li, Liang Z (liang.z...@intel.com) wrote: > > >> > > > >> > I use your new code: > > >> > ------------------------------------------------- > > >> > unsigned long *p = ... > > >> > if (p[0] || p[1] || p[2] || p[3] > > >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) > > >> > return BUFFER_NOT_ZERO; > > >> > else > > >> > return BUFFER_ZERO; > > >> > --------------------------------------------------- > > >> > and the result is almost the same. I also tried the check 8, 16 > > >> > long data at the beginning, same result. > > >> > > >> Interesting... Well, all I can say is that applaud you for testing > > >> your hypothesis with the benchmark. > > >> > > >> Probably the setup cost of memcmp is too high, because the testing > > >> loop is already very optimized. > > >> > > >> Please submit the AVX2 version if it helps! > > > > I read the email in the wrong order. Forget about my other email. > > > > Sorry, Juan. > > > > One thing I still can't understand, why the unit test in host environment > shows > 'memcmp()' have better performance?
Are you aware of any program other than QEMU that also wants to do something similar? Finding whether a block of memory is zero, sounds like something that would be useful in lots of places, I just can't think which ones. Dave > > Liang > > > > > > > > Yes, the AVX2 version really helps. I have already submitted it, could > > > you help to review it? > > > > > > I am curious about the original intention to add the SSE2 Intrinsics, > > > is the same reason? > > > > > > I even suspect the VM may impact the 'memcmp()' performance, is it > > possible? > > > > > > Liang > > > > > >> Paolo > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK