On Tue, Oct 04, 2016 at 08:17:28AM +0100, De Lara Guarch, Pablo wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of De Lara Guarch, > > Pablo > > Sent: Monday, October 03, 2016 11:51 PM > > To: Richardson, Bruce > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements > > > > Hi Bruce, > > > > > -----Original Message----- > > > From: Richardson, Bruce > > > Sent: Monday, October 03, 2016 2:59 AM > > > To: De Lara Guarch, Pablo > > > Cc: dev at dpdk.org > > > Subject: Re: [PATCH v4 0/4] Cuckoo hash enhancements > > > > > > On Fri, Sep 30, 2016 at 08:38:52AM +0100, Pablo de Lara wrote: > > > > This patchset improves lookup performance on the current hash library > > > > by changing the existing lookup bulk pipeline, with an improved > > > > pipeline, > > > > based on a loop-and-jump model, instead of the current 4-stage 2-entry > > > pipeline. > > > > Also, x86 vectorized intrinsics are used to improve performance when > > > comparing signatures. > > > > > > > > First patch reorganizes the order of the hash structure. > > > > The structure takes more than one 64-byte cache line, but not all > > > > the fields are used in the lookup operation (the most common operation). > > > > Therefore, all these fields have been moved to the first part of the > > structure, > > > > so they all fit in one cache line, improving slightly the performance in > > some > > > > scenarios. > > > > > > > > Second patch modifies the order of the bucket structure. > > > > Currently, the buckets store all the signatures together (current and > > > alternative). > > > > In order to be able to perform a vectorized signature comparison, > > > > all current signatures have to be together, so the order of the bucket > > > > has > > > been changed, > > > > having separated all the current signatures from the alternative > > signatures. > > > > > > > > Third patch introduces x86 vectorized intrinsics. > > > > When performing a lookup bulk operation, all current signatures in a > > bucket > > > > are compared against the signature of the key being looked up. > > > > Now that they all are together, a vectorized comparison can be > > performed, > > > > which takes less instructions to be carried out. > > > > In case of having a machine with AVX2, number of entries per bucket are > > > > increased from 4 to 8, as AVX2 allows comparing two 256-bit values, with > > > 8x32-bit integers, > > > > which are the 8 signatures on the bucket. > > > > > > > > Fourth (and last) patch modifies the current pipeline of the lookup bulk > > > function. > > > > The new pipeline is based on a loop-and-jump model. The two key > > > improvements are: > > > > > > > > - Better prefetching: in this case, first 4 keys to be looked up are > > prefetched, > > > > and after that, the rest of the keys are prefetched at the time the > > > calculation > > > > of the signatures are being performed. This gives more time for the > > > > CPU > > to > > > > prefetch the data requesting before actually need it, which result in > > > > less > > > > cache misses and therefore, higher throughput. > > > > > > > > - Lower performance penalty when using fallback: the lookup bulk > > > algorithm > > > > assumes that most times there will not be a collision in a bucket, > > > > but it > > > might > > > > happen that two or more signatures are equal, which means that more > > > than one > > > > key comparison might be necessary. In that case, only the key of the > > > > first > > > hit is prefetched, > > > > like in the current implementation. The difference now is that if this > > > comparison > > > > results in a miss, the information of the other keys to be compared > > > > has > > > been stored, > > > > unlike the current implementation, which needs to perform an entire > > > simple lookup again. > > > > > > > > Changes in v4: > > > > - Reordered hash structure, so alt signature is at the start > > > > of the next cache line, and explain in the commit message > > > > why it has been moved > > > > - Reordered hash structure, so name field is on top of the structure, > > > > leaving all the fields used in lookup in the next cache line > > > > (instead of the first cache line) > > > > > > > > Changes in v3: > > > > - Corrected the cover letter (wrong number of patches) > > > > > > > > Changes in v2: > > > > - Increased entries per bucket from 4 to 8 for all cases, > > > > so it is not architecture dependent any longer. > > > > - Replaced compile-time signature comparison function election > > > > with run-time election, so best optimization available > > > > will be used from a single binary. > > > > - Reordered the hash structure, so all the fields used by lookup > > > > are in the same cache line (first). > > > > > > > > Byron Marohn (3): > > > > hash: reorganize bucket structure > > > > hash: add vectorized comparison > > > > hash: modify lookup bulk pipeline > > > > > > > > > > Hi, > > > > > > Firstly, checkpatches is reporting some style errors in these patches. > > > > > > Secondly, when I run the "hash_multiwriter_autotest" I get what I assume > > to > > > be > > > an error after applying this patchset. Before this set is applied, running > > > that test shows the cycles per insert with/without lock elision. Now, > > > though > > > I'm getting an error about a key being dropped or failing to insert in > > > the lock > > > elision case, e.g. > > > > > > Core #2 inserting 1572864: 0 - 1,572,864 > > > key 1497087 is lost > > > 1 key lost > > > > > > I've run the test a number of times, and there is a single key lost each > > > time. > > > Please check on this, is it expected or is it a problem? > > > > I am seeing that error even without the patchset. I am still investigating > > it, > > but using "git bisect" looks like the problem is in commit 5fc74c2e146d > > ("hash: check if slot is empty with key index"). > > I found the problem, and I submitted a patch for it > (http://dpdk.org/dev/patchwork/patch/16361/). > Could you check if it works for you? >
That patch looks like a correct bugfix so I've acked it for you. However, I still see the error appearing very occasionally. Since it also appeared before I applied this set, I am ok to accept this set anyway. Please do a new version of the set with checkpatch issues fixed and keep my ack. Series Acked-by: Bruce Richardson <bruce.richardson at intel.com>