[dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements

De Lara Guarch, Pablo Tue, 4 Oct 2016 07:17:28 +0000


> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of De Lara Guarch,
> Pablo
> Sent: Monday, October 03, 2016 11:51 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements
> 
> Hi Bruce,
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Monday, October 03, 2016 2:59 AM
> > To: De Lara Guarch, Pablo
> > Cc: dev at dpdk.org
> > Subject: Re: [PATCH v4 0/4] Cuckoo hash enhancements
> >
> > On Fri, Sep 30, 2016 at 08:38:52AM +0100, Pablo de Lara wrote:
> > > This patchset improves lookup performance on the current hash library
> > > by changing the existing lookup bulk pipeline, with an improved pipeline,
> > > based on a loop-and-jump model, instead of the current 4-stage 2-entry
> > pipeline.
> > > Also, x86 vectorized intrinsics are used to improve performance when
> > comparing signatures.
> > >
> > > First patch reorganizes the order of the hash structure.
> > > The structure takes more than one 64-byte cache line, but not all
> > > the fields are used in the lookup operation (the most common operation).
> > > Therefore, all these fields have been moved to the first part of the
> structure,
> > > so they all fit in one cache line, improving slightly the performance in
> some
> > > scenarios.
> > >
> > > Second patch modifies the order of the bucket structure.
> > > Currently, the buckets store all the signatures together (current and
> > alternative).
> > > In order to be able to perform a vectorized signature comparison,
> > > all current signatures have to be together, so the order of the bucket has
> > been changed,
> > > having separated all the current signatures from the alternative
> signatures.
> > >
> > > Third patch introduces x86 vectorized intrinsics.
> > > When performing a lookup bulk operation, all current signatures in a
> bucket
> > > are compared against the signature of the key being looked up.
> > > Now that they all are together, a vectorized comparison can be
> performed,
> > > which takes less instructions to be carried out.
> > > In case of having a machine with AVX2, number of entries per bucket are
> > > increased from 4 to 8, as AVX2 allows comparing two 256-bit values, with
> > 8x32-bit integers,
> > > which are the 8 signatures on the bucket.
> > >
> > > Fourth (and last) patch modifies the current pipeline of the lookup bulk
> > function.
> > > The new pipeline is based on a loop-and-jump model. The two key
> > improvements are:
> > >
> > > - Better prefetching: in this case, first 4 keys to be looked up are
> prefetched,
> > >   and after that, the rest of the keys are prefetched at the time the
> > calculation
> > >   of the signatures are being performed. This gives more time for the CPU
> to
> > >   prefetch the data requesting before actually need it, which result in 
> > > less
> > >   cache misses and therefore, higher throughput.
> > >
> > > - Lower performance penalty when using fallback: the lookup bulk
> > algorithm
> > >   assumes that most times there will not be a collision in a bucket, but 
> > > it
> > might
> > >   happen that two or more signatures are equal, which means that more
> > than one
> > >   key comparison might be necessary. In that case, only the key of the 
> > > first
> > hit is prefetched,
> > >   like in the current implementation. The difference now is that if this
> > comparison
> > >   results in a miss, the information of the other keys to be compared has
> > been stored,
> > >   unlike the current implementation, which needs to perform an entire
> > simple lookup again.
> > >
> > > Changes in v4:
> > > - Reordered hash structure, so alt signature is at the start
> > >   of the next cache line, and explain in the commit message
> > >   why it has been moved
> > > - Reordered hash structure, so name field is on top of the structure,
> > >   leaving all the fields used in lookup in the next cache line
> > >   (instead of the first cache line)
> > >
> > > Changes in v3:
> > > - Corrected the cover letter (wrong number of patches)
> > >
> > > Changes in v2:
> > > - Increased entries per bucket from 4 to 8 for all cases,
> > >   so it is not architecture dependent any longer.
> > > - Replaced compile-time signature comparison function election
> > >   with run-time election, so best optimization available
> > >   will be used from a single binary.
> > > - Reordered the hash structure, so all the fields used by lookup
> > >   are in the same cache line (first).
> > >
> > > Byron Marohn (3):
> > >   hash: reorganize bucket structure
> > >   hash: add vectorized comparison
> > >   hash: modify lookup bulk pipeline
> > >
> >
> > Hi,
> >
> > Firstly, checkpatches is reporting some style errors in these patches.
> >
> > Secondly, when I run the "hash_multiwriter_autotest" I get what I assume
> to
> > be
> > an error after applying this patchset. Before this set is applied, running
> > that test shows the cycles per insert with/without lock elision. Now, though
> > I'm getting an error about a key being dropped or failing to insert in the 
> > lock
> > elision case, e.g.
> >
> >   Core #2 inserting 1572864: 0 - 1,572,864
> >   key 1497087 is lost
> >   1 key lost
> >
> > I've run the test a number of times, and there is a single key lost each 
> > time.
> > Please check on this, is it expected or is it a problem?
> 
> I am seeing that error even without the patchset. I am still investigating it,
> but using "git bisect" looks like the problem is in commit 5fc74c2e146d
> ("hash: check if slot is empty with key index").


I found the problem, and I submitted a patch for it 
(http://dpdk.org/dev/patchwork/patch/16361/).
Could you check if it works for you? 

> 
> Thanks,
> Pablo
> 
> >
> > Thanks,
> > /Bruce

[dpdk-dev] [PATCH v4 0/4] Cuckoo hash enhancements

Reply via email to