> -----Original Message----- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yipeng Wang > Sent: Wednesday, September 6, 2017 1:00 AM > To: dev@dpdk.org > Cc: Tai, Charlie <charlie....@intel.com>; Gobriel, Sameh > <sameh.gobr...@intel.com>; Wang, Ren <ren.w...@intel.com>; Wang, > Yipeng1 <yipeng1.w...@intel.com>; Mcnamara, John > <john.mcnam...@intel.com> > Subject: [dpdk-dev] [PATCH v3 7/7] doc: add membership documentation > > This patch adds the documentation for membership library. > > Signed-off-by: Yipeng Wang <yipeng1.w...@intel.com> > Reviewed-by: John McNamara <john.mcnam...@intel.com>
... > diff --git a/doc/guides/prog_guide/member_lib.rst > b/doc/guides/prog_guide/member_lib.rst > new file mode 100644 > index 0000000..2b32535 > --- /dev/null > +++ b/doc/guides/prog_guide/member_lib.rst > +Membership Library > +================== > + > +Introduction > +------------ > + > +The DPDK Membership Library provides an API for DPDK applications to > insert a > +new member, delete an existing member, or query the existence of a > member in a > +given set, or a group of sets. For the case of a group of sets the library "group of sets, the library". ... > +We are including a configurable Membership Library in DPDK to cover set This looks like more like a cover letter than part of programmer's guide. Reword this loke "This Membership Library covers set membership...", and avoid "include... in DPDK", as it is implicit. > +membership functionality for both a single set and multi-set scenarios. > The > +library is optimized to support the customer network applications which > require > +membership checking functionality. In this guide, we will cover two set- > summary > +schemes including a vector of Bloom Filters and Hash-Table based > +set-summary schemes with and without false negative probability, > followed by > +a brief discussion of the Membership Library API. > + > +Vector of Bloom Filters > +----------------------- > + > +Bloom Filter (BF) [Member-bloom] is a well-known space-efficient > +probabilistic data structure that answers set membership queries (test > whether > +an element is a member of a set) with some probability of false positives > and > +zero false negatives; a query for an element returns either it is "possibly > in > +a set" (with very high probability) or "definitely not in a set". > + > +The BF is a method for representing a set of ``n`` elements (for example > flow keys > +in network applications domain) to support membership queries. The idea > of BF is > +to allocate a bit-vector ``v`` with ``m`` bits, which are initially all set > to 0. > Then > +it chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with > hash values range from > +``1`` to ``m`` to perform hashing calculations on each element. From "0" to "m-1"? ... > + > + Vector Bloom Filter (vBF) Overview > + > +To support membership test for both multiple sets and a single set, > +the library implements a Vector Bloom Filter (vBF) scheme. > +vBF basically composes multiple bloom filters into a vector of bloom filers. > +The membership test is conducted on all of the > +bloom filters concurrently to determine which set(s) it belongs to or none > of > +them. The basic idea of vBF is shown in the above figure where an element > is > +used to address multiple bloom filters concurrently and the bloom filter > +index(es) with a hit is returned. > + > +.. _figure_membership5: > +.. figure:: img/member_i5.* > + > + vBF for Flow Scheduling to Worker Thread > + > +As previously mentioned, there are many usages of such structures. vBF is > used > +for applications that needs to check membership against multiple sets > +simultaneously. "needs" -> "need". ... > +The major usage of HTSS with false negative is to use it as a cache for > +distributing elements to different target sets. By allowing HTSS to evict old > +elements, the set-summary can keep track of the most recent elements > +(i.e. active) as a cache typically does. Old inactive elements (infrequently > +used elements) will automatically and eventually get evicted from the > +set-summary. It worth noting that the set-summary still has false positive "It is worth". ... > +We also pass two seeds: ``prim_hash_seed`` and > +``sec_hash_seed`` for the primary and secondary hash functions to > calculate two independent hash values. > +``socket_id`` parameter is the NUMA socket ID for the memory used to > create the > +set-summary. For HTSS, another parameter ``iscache`` is used to indicate In order to keep consistency, I would use "is_cache". > +if this set-summary is a cache (i.e. with false negative probability) or not. > +For vBF, extra parameters are needed. For example, ``num_set`` is the > number of > +sets needed to initialize the vector bloom filters. This number is equal to > the > +number of bloom filters will be created. > +``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate > will be used to determine > +the number of hash functions and the bloom filter size. > + > + > +Set-summary Element Insertion > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The ``rte_member_add()`` function is use to insert an element/key into a > set-summary structure. If it fails an > +error is returned. For success the returned value is dpendednt on the > +set-summary mode to provide extra information for the users. For vBF > +mode, a return value of 0 means a successful insert. For HTSS mode > without false negative, the insert > +could fail with ``-ENOSPC`` if the table is full. With false negative (i.e. > cache > mode), > +for insert that does not cause any eviction (i.e. no overwriting happens to > an > +existing entry) the return value is 0. For insertion that causes eviction, > the > return > +value is 1 to indicate such situation, but it is not an error. > + > +The input arguments for the function should include the ``key`` which is a > pointer to the element/key that needs to > +be added to the set-summary, and ``set_id`` which is the set id associated > +with the key that needs to be added. > + > + > +Set-summary Element Lookup > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Remove unneeded last "~" (I see 4 extra at the end). > + > +The ``rte_member_lookup()`` function looks up a single key/element in > the set-summary structure. It > +returns as soon as the first match is found. The return value is 1 if a > +match is found and 0 otherwise. The arguments for the function include > ``key`` which is a pointer to the > +element/key that needs to be looked up, and ``set_id`` which is used to > return the > +first target set id where the key has matched, if any. > + > +The ``rte_member_lookup_bulk()`` function is used to look up a bulk of > keys/elements in the > +set-summary structure for their first match. Each key lookup returns as > soon as the first match is found. The > +return value is the number of keys that find a match. The arguments of the > +function include ``keys`` which is a pointer to a bulk of keys that are to be > looked up, > +``num_keys`` is the number > +of keys that will be looked up, and ``set_ids`` are the return target set > +ids for the first match found for each of the input keys. ``set_ids`` is an > array > +needs to be sized according to the ``num_keys``. If there is no match, the > set id > +for that key will be set to RTE_MEMBER_NO_MATCH. > + > +The ``rte_member_lookup_multi()`` function looks up a single > key/element in the > +set-summary structure for multiple matches. It > +returns ALL the matches (possibly more than one) found for this key when > it > +is matched against all target sets (it worth noting that for cache mode > HTSS, "it is worth" > +the current implementation matches at most one target set). The return > value is > +the number of matches > +that was found for this key (for cache mode HTSS the return value > +should be at most 1). The arguments for the function include ``key`` which > is a pointer to the > +element/key that needs to be looked up, ``match_per_key`` which is to Better to use "max_match_per_key". Will comment more in other patches. > indicate maximum number of matches > +the user expect to find for each key, and ``set_id`` which is used to return > all "user expects to". > +target set ids where the key has matched, if any. The ``set_id`` array > should be sized > +according to ``match_per_key``. For vBF, maximum number of matches > per key is equal "For vBF, the maximum number". > +to the number of sets. For HTSS, maximum number of matches per key is > equal to two times > +entry count per bucket. ``match_per_key`` should be equal or smaller than "two time the entry count". > maximum number of > +possible matches. > + > +The ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of > keys/elements elements in the > +set-summary structure for multiple matches, each key lookup returns ALL > the matches (possibly more > +than one) found for this key when it is matched against all target sets > (cache mode HTSS > +matches at most one target set). The > +return value is the number of keys that find one or more matches in the > +set-summary structure. The arguments of the > +function include ``keys`` which is > +a pointer to a bulk of keys that are to be looked up, ``num_keys`` is the > number > +of keys that will be looked up, ``match_per_key`` is the possible > +max number of matches for each key, ``match_count`` which is the Use "maximum number", no need to abbreviate here. > returned number > +of matches for each key, and ``set_ids`` are the returned target set > +ids for all matches found for each keys. ``set_ids`` is 2-D array > +that for each key, a 1-D array should be sized according to > ``match_per_key``. I would reword it a bit, to improve readability. For instance: "set_ids" is a 2-D array, containing a 1-D array per key, which should be sized according to "match_per_key". > +``match_per_key`` should be equal or smaller than maximum number of > +possible matches, similar to ``rte_member_lookup_multi``. > + > + > +Set-summary Element Delete > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The ``rte_membership_delete()`` function deletes an element/key from a > set-summary structure, if it fails > +an error is returned. The input arguments should include ``key`` which is a > pointer to the > +element/key that needs to be deleted from the set-summary, and > ``set_id`` > +which is the set id associated with the key to delete. It worth noting that > current "It is worth". > +implementation of vBF does not support deletion [1]_. An error code ``- > EINVAL`` will be returned.