On Fri, Jun 12, 2015 at 10:30:56AM +0200, Ond?ej B?lka wrote: > On Mon, May 18, 2015 at 01:01:42PM -0700, Ravi Kerur wrote: > > Background: > > After preliminary discussion with John (Zhihong) and Tim from Intel it was > > decided that it would be beneficial to use AVX/SSE intrinsics for memcmp > > similar to memcpy that had been implemeneted. In addition, we decided to use > > librte_hash as a test candidate to test both functionality and performance. > > > > Further discussions lead to complete functionality implementation of memory > > comparison and v3 code reflects that. > > > > Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04, > > x86_64, 16GB DDR3 system. > > > > Ravi Kerur (1): > > Implement memcmp using Intel SIMD instrinsics. > > As my previous mail got lost I am resending it. > > In short you shouldn't > use sse2/avx2 for memcmp at all. In 95% of calls you find inequality in > first 8 bytes so sse2 adds just unnecessary overhead versus checking > these with. > > 190: 48 8b 4e 08 mov 0x8(%rsi),%rcx > 194: 48 39 4f 08 cmp %rcx,0x8(%rdi) > 198: 75 f3 jne 18d <memeq30+0xd> > > Also as you have full memcmp does in your gcc optimize out > if (memcmp(x,y)) > like in mine? > > So run also implementation below in your benchmark, my guess is it will > be faster. > <snip for brevity>
Thanks for the contribution. It's very informative! /Bruce