> -----Original Message-----
> From: Richardson, Bruce
> Sent: Thursday, September 27, 2018 1:28 PM
> To: Ananyev, Konstantin <konstantin.anan...@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Wang, Yipeng1
> <yipeng1.w...@intel.com>; dev@dpdk.org;
> mic...@digirati.com.br
> Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
>
> On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Thursday, September 27, 2018 12:15 PM
> > > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > > Cc: Wang, Yipeng1 <yipeng1.w...@intel.com>; dev@dpdk.org;
> > > mic...@digirati.com.br
> > > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> > >
> > > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yipeng Wang <yipeng1.w...@intel.com>
> > > > > Sent: Friday, September 21, 2018 12:18 PM
> > > > > To: bruce.richard...@intel.com
> > > > > Cc: dev@dpdk.org; yipeng1.w...@intel.com; mic...@digirati.com.br;
> > > > > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> > > > >
> > > > > In use cases that hash table capacity needs to be guaranteed, the
> > > > > extendable
> > > > > bucket feature can be used to contain extra keys in linked lists when
> > > > > conflict
> > > > > happens. This is similar concept to the extendable bucket hash table
> > > > > in packet
> > > > > framework.
> > > > >
> > > > > This commit adds the extendable bucket feature. User can turn it on
> > > > > or off
> > > > > through the extra flag field during table creation time.
> > > > >
> > > > > Extendable bucket table composes of buckets that can be linked list
> > > > > to current
> > > > > main table. When extendable bucket is enabled, the table utilization
> > > > > can
> > > > > always acheive 100%.
> > > > IMO, referring to this as 'table utilization' indicates an efficiency
> > > > about memory utilization. Please consider changing this to indicate
> that
> > > all of the configured number of entries will be accommodated?
> > > >
> > > > > Although keys ending up in the ext buckets may have longer look up
> > > > > time, they
> > > > > should be rare due to the cuckoo algorithm.
> > > > >
> > > > > Signed-off-by: Yipeng Wang <yipeng1.w...@intel.com>
> > > > > ---
> > > > > lib/librte_hash/rte_cuckoo_hash.c | 326
> > > > > +++++++++++++++++++++++++++++++++-----
> > > > > lib/librte_hash/rte_cuckoo_hash.h | 5 +
> > > > > lib/librte_hash/rte_hash.h | 3 +
> > > > > 3 files changed, 292 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > index f7b86c8..616900b 100644
> > > > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > @@ -31,6 +31,10 @@
> > > > > #include "rte_hash.h"
> > > > > #include "rte_cuckoo_hash.h"
> > > > >
> > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > > > \
> > > > > +for (CURRENT_BKT = START_BUCKET;
> > > > > \
> > > > > +CURRENT_BKT != NULL; \
> > > > > +CURRENT_BKT = CURRENT_BKT->next)
> > > > >
> > > > > TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > > > >
> > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> > > > > return h;
> > > > > }
> > > > >
> > > > > +static inline struct rte_hash_bucket *
> > > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > > > > +while (lst_bkt->next != NULL)
> > > > > +lst_bkt = lst_bkt->next;
> > > > > +return lst_bkt;
> > > > > +}
> > > > > +
> > > > > void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t
> > > > > func) {
> > > > > h->cmp_jump_table_idx = KEY_CUSTOM;
> > > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > > struct rte_tailq_entry *te = NULL;
> > > > > struct rte_hash_list *hash_list;
> > > > > struct rte_ring *r = NULL;
> > > > > +struct rte_ring *r_ext = NULL;
> > > > > char hash_name[RTE_HASH_NAMESIZE];
> > > > > void *k = NULL;
> > > > > void *buckets = NULL;
> > > > > +void *buckets_ext = NULL;
> > > > > char ring_name[RTE_RING_NAMESIZE];
> > > > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > > > > unsigned num_key_slots;
> > > > > unsigned i;
> > > > > unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > > > +unsigned int ext_table_support = 0;
> > > > > unsigned int readwrite_concur_support = 0;
> > > > >
> > > > > rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > > multi_writer_support = 1;
> > > > > }
> > > > >
> > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > > > +ext_table_support = 1;
> > > > > +
> > > > > /* Store all keys and leave the first entry as a dummy entry for
> > > > > lookup_bulk */
> > > > > if (multi_writer_support)
> > > > > /*
> > > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > > goto err;
> > > > > }
> > > > >
> > > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > > > +RTE_HASH_BUCKET_ENTRIES;
> > > > > +
> > > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > > > +params-
> > > > > >name);
> > > > Can be inside the if statement below.
> > > >
> > > > > +/* Create ring for extendable buckets. */
> > > > > +if (ext_table_support) {
> > > > > +r_ext = rte_ring_create(ext_ring_name,
> > > > > +rte_align32pow2(num_buckets + 1),
> > > > > +params->socket_id, 0);
> > > > > +
> > > > > +if (r_ext == NULL) {
> > > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > > > "
> > > > > +"failed\n");
> > > > > +goto err;
> > > > > +}
> > > > > +}
> > > > > +
> > > > > snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > > > >
> > > > > rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > > goto err_unlock;
> > > > > }
> > > > >
> > > > > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > > > > -/ RTE_HASH_BUCKET_ENTRIES;
> > > > > -
> > > > > buckets = rte_zmalloc_socket(NULL,
> > > > > num_buckets * sizeof(struct rte_hash_bucket),
> > > > > RTE_CACHE_LINE_SIZE, params->socket_id);
> > > > >
> > > > > if (buckets == NULL) {
> > > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > > > > goto err_unlock;
> > > > > }
> > > > >
> > > > > +/* Allocate same number of extendable buckets */
> > > > IMO, we are allocating too much memory to support this feature.
> > > > Especially, when we claim that keys ending up in the extendable
> table is
> > > a rare occurrence. By doubling the memory we are effectively saying that
> > > the main table might have 50% utilization. It will also
> significantly
> > > increase the cycles required to iterate the complete hash table (in
> > > rte_hash_iterate API) even when we expect that the extendable table
> > > contains very few entries.
> > > >
> > > > I am wondering if we can provide options to control the amount of extra
> > > > memory that gets allocated and make the memory allocation
> > > dynamic (or on demand basis). I think this also goes well with the
> > > general direction DPDK is taking - allocate resources as needed rather
> than
> > > allocating all the resources during initialization.
> > > >
> > >
> > > Given that adding new entries should not normally be a fast-path function,
> >
> > Umm, I don't think I agree with that.
> > There are plenty of cases where add/delete speed is important.
> > Konstantin
>
> True, I suppose.
> Perhaps then the best approach is to give a couple of options to the
> developer. Allow specifying an initial amount of memory, and then an option
> to allow the memory to grow or not. If the cost of memory allocation is a
> problem for a specific app, then they can provide a large default and not
> allow growing, while other apps, for whom speed is not that critical, can
> provide a small default and allow growing.
>
> Does that seem reasonable?
Yes, I think it does.
Konstantin