Hi Ravi, > -----Original Message----- > From: Ananyev, Konstantin > Sent: Wednesday, May 13, 2015 11:02 AM > To: Ananyev, Konstantin > Subject: FW: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE > instructions. > > > > From: Ravi Kerur [mailto:rkerur at gmail.com] > Sent: Monday, May 11, 2015 9:47 PM > To: Ananyev, Konstantin > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE > instructions. > > Hi Konstantin, > > On Mon, May 11, 2015 at 12:35 PM, Ananyev, Konstantin <konstantin.ananyev at > intel.com> wrote: > > Hi Ravi, > > > > > From: Ravi Kerur [mailto:rkerur at gmail.com] > > Sent: Monday, May 11, 2015 6:43 PM > > To: Ananyev, Konstantin > > Cc: Matt Laswell; dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE > > instructions. > > > > Hi Konstantin, > > > > > > On Mon, May 11, 2015 at 2:51 AM, Ananyev, Konstantin <konstantin.ananyev at > > intel.com> wrote: > > Hi Ravi, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur > > > Sent: Friday, May 08, 2015 11:55 PM > > > To: Matt Laswell > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE > > > instructions. > > > > > > On Fri, May 8, 2015 at 3:29 PM, Matt Laswell <laswell at infiniteio.com> > > > wrote: > > > > > > > > > > > > > > > On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur <rkerur at gmail.com> wrote: > > > > > > > >> This patch replaces memcmp in librte_hash with rte_memcmp which is > > > >> implemented with AVX/SSE instructions. > > > >> > > > >> +static inline int > > > >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > > > >> +{ > > > >> +? ? ? ?const uint8_t *src_1 = (const uint8_t *)_src_1; > > > >> +? ? ? ?const uint8_t *src_2 = (const uint8_t *)_src_2; > > > >> +? ? ? ?int ret = 0; > > > >> + > > > >> +? ? ? ?if (n & 0x80) > > > >> +? ? ? ? ? ? ? ?return rte_cmp128(src_1, src_2); > > > >> + > > > >> +? ? ? ?if (n & 0x40) > > > >> +? ? ? ? ? ? ? ?return rte_cmp64(src_1, src_2); > > > >> + > > > >> +? ? ? ?if (n & 0x20) { > > > >> +? ? ? ? ? ? ? ?ret = rte_cmp32(src_1, src_2); > > > >> +? ? ? ? ? ? ? ?n -= 0x20; > > > >> +? ? ? ? ? ? ? ?src_1 += 0x20; > > > >> +? ? ? ? ? ? ? ?src_2 += 0x20; > > > >> +? ? ? ?} > > > >> > > > >> > > > > Pardon me for butting in, but this seems incorrect for the first two > > > > cases > > > > listed above, as the function as written will only compare the first > > > > 128 or > > > > 64 bytes of each source and return the result.? The pattern expressed in > > > > the 32 byte case appears more correct, as it compares the first 32 bytes > > > > and then lets later pieces of the function handle the smaller remaining > > > > bits of the sources. Also, if this function is to handle arbitrarily > > > > large > > > > source data, the 128 byte case needs to be in a loop. > > > > > > > > What am I missing? > > > > > > > > > > Current max hash key length supported is 64 bytes, hence no comparison is > > > done after 64 bytes. 128 bytes comparison is added to measure performance > > > only and there is no use-case as of now. With the current use-cases its > > > not > > > required but if there is a need to handle large arbitrary data upto 128 > > > bytes it can be modified. > > So on x86 let say rte_memcmp(k1, k2, 65) might produce invalid results, > > right? > > While on PPC will work as expected (as it calls memcpu underneath)? > > That looks really weird to me. > > If you plan to use rte_memcmp only for hash comparisons, then probably > > you should put it somewhere into librte_hash and name it accordingly: > > rte_hash_key_cmp() or something. > > And put a big comment around it, that it only works with particular lengths. > > If you want it to be a generic function inside EAL, then it probably need > > to handle different lengths properly > > on all supported architectures. > > Konstantin > > > > > > Let me just explain it here and probably add it to document as well. > > > > rte_memcmp is not > > > > 1. a replacement to memcmp > > > > 2. ?restricted to hash key comparison > > > > rte_memcmp is > > > > 1. optimized comparison for 16 to 128 bytes, v1 patch series had this > > support. Changed some of the logic in v2 due to concerns raised > > for unavailable use-cases beyond 64 bytes comparison. > From what I see in v2 it supposed to work correctly for len in [0,64] and? > len=128, right? > Not sure I get it: so for v1 it was able to handle any length correctly, but > then you removed it? > If so, I wonder what was the reason? Make it faster? > > My initial discussion was with Zhilong(John) from Intel and we decided to > implement up to 128 bytes comparison and use rte_hash > and rte_lpm6 as a candidate for testing. When I sent out v1 patch, Bruce > comments were on use-case for 128 bytes comparison and > was it really required? Hence I decided in v2 to support only up to 64 bytes > and added 128 bytes only for performance measurement. > Personally I think support for up to 128 bytes comparison is required, there > might not be use-cases today but it will definitely be > useful.
Ok, we don't have a real usage case for it now, but it still probably good to have it work with arbitrary key-length. Again, as Don suggested in another mail, we can have an optimised implementation for particular sizes and fall back to slow-path (memcp) for all other cases. Even if you'll decide to limit len to particular value (64/128), it is probably not very good to have a gap in between, as it exists now [65-127]. > > Another thing that looks strange to me: > While all rte_cmp*() uses actual data values for comparison results, > rte_memcmp_remainder() return value depends not only on data values but also > on data locations: > > +static inline int > +rte_memcmp_remainder(const uint8_t *src_1u, const uint8_t *src_2u, size_t n) > +{ > ... > exit: > + > +? ? ? ?return src_1u < src_2u ? -1 : 1; > +} > > This is a bug and its not supposed to be there. I will fix it. Thanks for > catching it. > > If you just test for equal/not equal that doesn't really matter. > If this is supposed to be a 'proper' comparison function, then the result is > sort of unpredictable. > > With minor tuning over the weekend I am able to get better performance for > > anything between 16 to 128 bytes comparison. > > > > 2. will be specific to DPDK ?i.e. currently all memcmp usage in DPDK are > > for equality or inequality hence "less than" or "greater than" > > implementation in rte_memcmp doesn't make sense and will be removed in > > subsequent patches, it will return 0 or 1 for > > equal/unequal cases. > > If you don't plan your function to follow memcmp() semantics and syntax, why > to name it rte_memcmp()? > I? think that will make a lot of confusion around. > Why not to name it differently(and put a clear comment in the declaration of > course)? > > Following memcmp semantics is not hard but there are no use-cases for it in > DPDK currently. Keeping it specific to DPDK usage > simplifies code as well. I can change the name to "rte_compare" and add > comments to the function. Will it work? Yep, either rte_compare(), or as Don suggested rte_testequal() - both seems good to me. Konstantin > > > > > > rte_hash will be the first candidate to move to rte_memcmp and subsequently > > rte_lpm6 which uses 16 bytes comparison will be > > moved > > > > Later on RING_SIZE which uses large size for comparison will be moved. I am > > currently studying/understanding that logic and will > make > > changes to rte_memcmp to support that. > > Sorry, didn't get you here. > > Once rte_hash, rte_lpm6 changes and new compare function code are reviewed > and accepted I plan to move to different > components (RING_SIZE is currently defined to be from 256 to 16384 bytes) and > memcmp function being used in test_ring, > test_pmd_ring and other functions. I did not want to add all component > changes into one patch series as it causes high review latency > or patch series just dies down silently. Instead make patches small and > incremental in every series, hope this clarifies. > Thanks, > Ravi > Konstantin > > > > > I don't want to make lot of changes in one shot and see that patch series > > die a slow death with no takers. > > > > Thanks, > > Ravi > > > > > > > > > > > > > -- > > > > Matt Laswell > > > > infinite io, inc. > > > > laswell at infiniteio.com > > > > > > > >