On Wed, 28 Aug 2019 14:51:48 +0800 Bing Zhao <bi...@mellanox.com> wrote:
> In the current hash library, there are already two hash tables and > several hash calculation functions. > > FBK hash table is very lightweight, but the key length is limited to > 4 bytes and the size of the table is fixed during startup. > > Cuckoo hash table is a quite great idea and nice implementation in > the library, it is efficient and flexible. After some study of the > code and information from internet, I find that the table extension > is some pain point (correct me if anything wrong). It means that we > need to allocate the tables in advance by defining a maximum size. > So there is a chance that we waste some unused memory. Right now > there is an extendable buckets table, but it seems that the number > is also fixed and the same as the primary table. > Take the flows information for example, we may support as many as > several millions of flows. In some case, only several hundreds of > flows will be created but in the other case, millions of flows are > needed. So this means that we need to create the hash table to > support millions of elements in advance during the startup time. If > one key of flow information will consume 64B and 1M flows, so it > will occupy more than one hundred MB memory (the table fields > included). What is worse if there is only 2M + several elements, it > needs to create a table of 4M (or more: depends on the hash > collision rate) and it is some wasting of the memory. > > In order to handle this, an resizable hash list is introduced. > The table could start with a small number of the elements and be > allocated dynamically during the runtime. In the meanwhile, an > on-demand list splitting mechanism is used in order not to make a > significant performance degradation. Then there is no need to > re-hash and relocate all the existing elements in the table when the > table is extended. > > The brief design is as below: > 1. The table is consisted of LIST header array and the LIST entries. > In each entry, a pre-calculated hash signature is stored and is > used to decide which header should it be linked to, by using > "AND" with the mask to select the LSBs of the signature. > 2. The header array size could start with a very small number, and a > specified max number of each list is used to check if a table > extension is needed. > 3. When the max number on any of list is reached, the header array > size will be doubled. Then each entries linked to this list will > be split into two lists with the new mask (one more bit 1 in > the mask, e.g. 0xfff -> 0x1fff). And a global shift number and > local shift number of each list is used for the further checking. > 4. When some other list is being accessed, a comparison for the shift > numbers is used to check if the splitting of this list is needed. > If so, then there will be two conditions: > a. If the local shift number is only 1 less than global or > non-zero, then this list is the one that needs to be split. > b. If more than 1, then it means that the table is extended more > than once. And If the local shift is zero, a mechanism is used > to find the last unsplit list. > And then the list will be split into 2/4/8... lists depends on > the gap. All the entries then will linked to the proper header. > So, each time when the hlist APIs are called, only one list will be > traversed but not all the lists. And since there is parameter to set > a max number of entries in a list. The traversal time is predictable > and these will not cause a significant performance degradation. > > BR. Bing > > > Bing Zhao (1): > rte_hash: introduce hash list into hash lib > > lib/librte_hash/Makefile | 2 + > lib/librte_hash/rte_hash_version.map | 15 + > lib/librte_hash/rte_hlist.c | 687 > +++++++++++++++++++++++++++++++++++ > lib/librte_hash/rte_hlist.h | 563 ++++++++++++++++++++++++++++ > 4 files changed, 1267 insertions(+) > create mode 100644 lib/librte_hash/rte_hlist.c > create mode 100644 lib/librte_hash/rte_hlist.h > A resizeable hash table (with RCU?) would be good to have. See old article about this https://lwn.net/Articles/573431/ But this patch got review feedback and no follow up. Marking it as Changes Requested, maybe someone can resuscitate it later.