On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote: >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur >> Sent: Tuesday, March 8, 2016 7:01 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and >> SSE intrinsics >> >> v1: >> This patch adds memcmp functionality using AVX and SSE >> intrinsics provided by Intel. For other architectures >> supported by DPDK regular memcmp function is used. >> >> Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) >> systems. >> > [...] > >> + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { >> + __m128i idx = >> + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, >> 3, 2, 1, 0); > > line over 80 characters ;) > >> + >> + /* >> + * Reverse byte order >> + */ >> + xmm0 = _mm_shuffle_epi8(xmm0, idx); >> + xmm1 = _mm_shuffle_epi8(xmm1, idx); >> + >> + /* >> + * Compare unsigned bytes with instructions for signed bytes >> + */ >> + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); >> + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); >> + >> + return _mm_movemask_epi8(xmm0 > xmm1) - >> _mm_movemask_epi8(xmm1 > xmm0); >> + } >> + >> + return 0; >> +} > > [...] > >> +static inline int >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) >> +{ >> + const uint8_t *src_1 = (const uint8_t *)_src_1; >> + const uint8_t *src_2 = (const uint8_t *)_src_2; >> + int ret = 0; >> + >> + if (n < 16) >> + return rte_memcmp_regular(src_1, src_2, n); > [...] >> + >> + while (n > 512) { >> + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256); > > Thanks for the great work! > > Seems to me there's a big improvement area before going into detailed > instruction layout tuning that -- No unalignment handling here for large > size memcmp. > > So almost without a doubt the performance will be low in micro-architectures > like Sandy Bridge if the start address is unaligned, which might be a > common case.
Patch is waiting for comment for a long time, since 2016 May. Updating patch status as rejected. Anyone planning to work on vectorized version of rte_memcmp() can benefit from this patch: https://patches.dpdk.org/patch/11156/ https://patches.dpdk.org/patch/11157/