On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote: > > > > -----Original Message----- > > From: Yipeng Wang <yipeng1.w...@intel.com> > > Sent: Friday, September 21, 2018 12:18 PM > > To: bruce.richard...@intel.com > > Cc: dev@dpdk.org; yipeng1.w...@intel.com; mic...@digirati.com.br; > > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature > > > > In use cases that hash table capacity needs to be guaranteed, the extendable > > bucket feature can be used to contain extra keys in linked lists when > > conflict > > happens. This is similar concept to the extendable bucket hash table in > > packet > > framework. > > > > This commit adds the extendable bucket feature. User can turn it on or off > > through the extra flag field during table creation time. > > > > Extendable bucket table composes of buckets that can be linked list to > > current > > main table. When extendable bucket is enabled, the table utilization can > > always acheive 100%. > IMO, referring to this as 'table utilization' indicates an efficiency about > memory utilization. Please consider changing this to indicate that all of the > configured number of entries will be accommodated? > > > Although keys ending up in the ext buckets may have longer look up time, > > they > > should be rare due to the cuckoo algorithm. > > > > Signed-off-by: Yipeng Wang <yipeng1.w...@intel.com> > > --- > > lib/librte_hash/rte_cuckoo_hash.c | 326 > > +++++++++++++++++++++++++++++++++----- > > lib/librte_hash/rte_cuckoo_hash.h | 5 + > > lib/librte_hash/rte_hash.h | 3 + > > 3 files changed, 292 insertions(+), 42 deletions(-) > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c > > b/lib/librte_hash/rte_cuckoo_hash.c > > index f7b86c8..616900b 100644 > > --- a/lib/librte_hash/rte_cuckoo_hash.c > > +++ b/lib/librte_hash/rte_cuckoo_hash.c > > @@ -31,6 +31,10 @@ > > #include "rte_hash.h" > > #include "rte_cuckoo_hash.h" > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET) > > \ > > +for (CURRENT_BKT = START_BUCKET; \ > > +CURRENT_BKT != NULL; \ > > +CURRENT_BKT = CURRENT_BKT->next) > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry); > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name) > > return h; > > } > > > > +static inline struct rte_hash_bucket * > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) { > > +while (lst_bkt->next != NULL) > > +lst_bkt = lst_bkt->next; > > +return lst_bkt; > > +} > > + > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func) { > > h->cmp_jump_table_idx = KEY_CUSTOM; > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > struct rte_tailq_entry *te = NULL; > > struct rte_hash_list *hash_list; > > struct rte_ring *r = NULL; > > +struct rte_ring *r_ext = NULL; > > char hash_name[RTE_HASH_NAMESIZE]; > > void *k = NULL; > > void *buckets = NULL; > > +void *buckets_ext = NULL; > > char ring_name[RTE_RING_NAMESIZE]; > > +char ext_ring_name[RTE_RING_NAMESIZE]; > > unsigned num_key_slots; > > unsigned i; > > unsigned int hw_trans_mem_support = 0, multi_writer_support = 0; > > +unsigned int ext_table_support = 0; > > unsigned int readwrite_concur_support = 0; > > > > rte_hash_function default_hash_func = (rte_hash_function)rte_jhash; > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > multi_writer_support = 1; > > } > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE) > > +ext_table_support = 1; > > + > > /* Store all keys and leave the first entry as a dummy entry for > > lookup_bulk */ > > if (multi_writer_support) > > /* > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > goto err; > > } > > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) / > > +RTE_HASH_BUCKET_ENTRIES; > > + > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s", > > +params- > > >name); > Can be inside the if statement below. > > > +/* Create ring for extendable buckets. */ > > +if (ext_table_support) { > > +r_ext = rte_ring_create(ext_ring_name, > > +rte_align32pow2(num_buckets + 1), > > +params->socket_id, 0); > > + > > +if (r_ext == NULL) { > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation > > " > > +"failed\n"); > > +goto err; > > +} > > +} > > + > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name); > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters > > *params) > > goto err_unlock; > > } > > > > -const uint32_t num_buckets = rte_align32pow2(params->entries) > > -/ RTE_HASH_BUCKET_ENTRIES; > > - > > buckets = rte_zmalloc_socket(NULL, > > num_buckets * sizeof(struct rte_hash_bucket), > > RTE_CACHE_LINE_SIZE, params->socket_id); > > > > if (buckets == NULL) { > > -RTE_LOG(ERR, HASH, "memory allocation failed\n"); > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n"); > > goto err_unlock; > > } > > > > +/* Allocate same number of extendable buckets */ > IMO, we are allocating too much memory to support this feature. Especially, > when we claim that keys ending up in the extendable table is a rare > occurrence. By doubling the memory we are effectively saying that the main > table might have 50% utilization. It will also significantly increase the > cycles required to iterate the complete hash table (in rte_hash_iterate API) > even when we expect that the extendable table contains very few entries. > > I am wondering if we can provide options to control the amount of extra > memory that gets allocated and make the memory allocation dynamic (or on > demand basis). I think this also goes well with the general direction DPDK is > taking - allocate resources as needed rather than allocating all the > resources during initialization. >
Given that adding new entries should not normally be a fast-path function, how about allowing memory allocation in add itself. Why not initialize with a fairly small number of extra bucket entries, and then each time they are all used, double the number of entries. That will give efficient resource scaling, I think. /Bruce