> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur > Sent: Tuesday, March 8, 2016 7:01 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and > SSE intrinsics > > v1: > This patch adds memcmp functionality using AVX and SSE > intrinsics provided by Intel. For other architectures > supported by DPDK regular memcmp function is used. > > Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) > systems. > [...]
> + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { > + __m128i idx = > + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, > 3, 2, 1, 0); line over 80 characters ;) > + > + /* > + * Reverse byte order > + */ > + xmm0 = _mm_shuffle_epi8(xmm0, idx); > + xmm1 = _mm_shuffle_epi8(xmm1, idx); > + > + /* > + * Compare unsigned bytes with instructions for signed bytes > + */ > + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); > + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); > + > + return _mm_movemask_epi8(xmm0 > xmm1) - > _mm_movemask_epi8(xmm1 > xmm0); > + } > + > + return 0; > +} [...] > +static inline int > +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > +{ > + const uint8_t *src_1 = (const uint8_t *)_src_1; > + const uint8_t *src_2 = (const uint8_t *)_src_2; > + int ret = 0; > + > + if (n < 16) > + return rte_memcmp_regular(src_1, src_2, n); [...] > + > + while (n > 512) { > + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256); Thanks for the great work! Seems to me there's a big improvement area before going into detailed instruction layout tuning that -- No unalignment handling here for large size memcmp. So almost without a doubt the performance will be low in micro-architectures like Sandy Bridge if the start address is unaligned, which might be a common case. > + if (unlikely(ret != 0)) > + return ret; > + > + ret = rte_cmp256(src_1 + 1 * 256, src_2 + 1 * 256); > + if (unlikely(ret != 0)) > + return ret; > + > + src_1 = src_1 + 512; > + src_2 = src_2 + 512; > + n -= 512; > + } > + goto CMP_BLOCK_LESS_THAN_512; > +} > + > +#else /* RTE_MACHINE_CPUFLAG_AVX2 */