[dpdk-dev] [PATCH v3] Implement memcmp using SIMD intrinsics

Bruce Richardson Fri, 12 Jun 2015 10:03:35 +0100

On Fri, Jun 12, 2015 at 10:30:56AM +0200, Ond?ej B?lka wrote:
> On Mon, May 18, 2015 at 01:01:42PM -0700, Ravi Kerur wrote:
> > Background:
> > After preliminary discussion with John (Zhihong) and Tim from Intel it was
> > decided that it would be beneficial to use AVX/SSE intrinsics for memcmp
> > similar to memcpy that had been implemeneted. In addition, we decided to use
> > librte_hash as a test candidate to test both functionality and performance.
> > 
> > Further discussions lead to complete functionality implementation of memory
> > comparison and v3 code reflects that.
> > 
> > Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04,
> > x86_64, 16GB DDR3 system.
> > 
> > Ravi Kerur (1):
> >   Implement memcmp using Intel SIMD instrinsics.
> 
> As my previous mail got lost I am resending it. 
> 
> In short you shouldn't
> use sse2/avx2 for memcmp at all. In 95% of calls you find inequality in
> first 8 bytes so sse2 adds just unnecessary overhead versus checking
> these with.
> 
> 190:   48 8b 4e 08             mov    0x8(%rsi),%rcx
> 194:   48 39 4f 08             cmp    %rcx,0x8(%rdi)
> 198:   75 f3                   jne    18d <memeq30+0xd>
> 
> Also as you have full memcmp does in your gcc optimize out 
> if (memcmp(x,y)) 
> like in mine?
> 
> So run also implementation below in your benchmark, my guess is it will
> be faster.
> 
<snip for brevity>


Thanks for the contribution. It's very informative!

/Bruce

[dpdk-dev] [PATCH v3] Implement memcmp using SIMD intrinsics

Reply via email to