> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce > Sent: Tuesday, September 09, 2014 11:45 AM > To: Matthew Hall; dev at dpdk.org > Subject: Re: [dpdk-dev] Defaults for rte_hash > > > -----Original Message----- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall > > Sent: Tuesday, September 09, 2014 11:32 AM > > To: dev at dpdk.org > > Subject: [dpdk-dev] Defaults for rte_hash > > > > Hello, > > > > I was looking at the code which inits rte_hash objects in examples/l3fwd. > It's > > using approx. 1M to 4M hash 'entries' depending on 32-bit vs 64-bit, but > > it's > > setting the 'bucket_entries' to just 4. > > > > Normally I'm used to using somewhat deeper hash buckets than that... it > seems > > like having a zillion little tiny hash buckets would cause more TLB pressure > > and memory overhead... or does 4 get shifted / exponentiated into 2**4 ? > >
That 4 is not shifted, so it is actually 4 entries/bucket. Actually, the maximum number of entries you can use is 16, as bucket will be as big as a cache line. However, regardless the number of entries, memory size will remain the same, but using 4 entries/bucket, with 16-byte key, all keys stored for a bucket will fit in a cache line, so performance looks to be better in this case (although a non-optimal hash function could lead not to be able to store all keys, as chances to fill a bucket are higher). Anyway, for this example, 4 entries/bucket looks a good number to me. > > The documentation in > > http://dpdk.org/doc/api/structrte__hash__parameters.html > > and http://dpdk.org/doc/api/rte__hash_8h.html isn't that clear... is there a > > better place to look for this? > > > > In my case I'm looking to create a table of 4M or 8M entries, containing > > tables of security threat IPs / domains, to be detected in the traffic. So > > it > > would be good to have some understanding how not to waste a ton of > memory > > on a > > table this huge without making it run super slow either. > > > > Did anybody have some experience with how to get this right? > > It might be worth looking too at the hash table structures in the librte_table > directory for packet framework. These should give better scalability across > millions of flows than the existing rte_hash implementation. [We're looking > here to provide in the future a similar, more scalable, hash table > implementation with an API like that of rte_hash, but that is still under > development here at the moment.] > > > > > Another thing... the LPM table uses 16-bit Hop IDs. But I would probably > have > > more than 64K CIDR blocks of badness on the Internet available to me for > > analysis. How would I cope with this, besides just letting some attackers > > escape unnoticed? ;) > > Actually, I think the next hop field in the lpm implementation is only 8-bits, > not 16 :-). Each lpm entry is only 16-bits in total. > > > > > Have we got some kind of structure which allows a greater number of > CIDRs > > even > > if it's not quite as fast? > > > > Thanks, > > Matthew.