Please see one comment inlined: > -----Original Message----- > From: De Lara Guarch, Pablo > Sent: Monday, October 2, 2017 8:44 AM > To: Wang, Yipeng1 <yipeng1.w...@intel.com>; dev@dpdk.org > Cc: tho...@monjalon.net; Tai, Charlie <charlie....@intel.com>; Gobriel, > Sameh <sameh.gobr...@intel.com>; Mcnamara, John > <john.mcnam...@intel.com> > Subject: RE: [PATCH v4 3/7] member: implement vBF mode > > > > > -----Original Message----- > > From: Wang, Yipeng1 > > Sent: Wednesday, September 27, 2017 6:41 PM > > To: dev@dpdk.org > > Cc: tho...@monjalon.net; Tai, Charlie <charlie....@intel.com>; Gobriel, > > Sameh <sameh.gobr...@intel.com>; De Lara Guarch, Pablo > > <pablo.de.lara.gua...@intel.com>; Mcnamara, John > > <john.mcnam...@intel.com>; Wang, Yipeng1 <yipeng1.w...@intel.com> > > Subject: [PATCH v4 3/7] member: implement vBF mode > > > > Bloom Filter (BF) [1] is a well-known space-efficient probabilistic data > > structure that answers set membership queries. > > Vector of Bloom Filters (vBF) is an extension to traditional BF that > > supports > > multi-set membership testing. Traditional BF will return found or not-found > > for each key. vBF will also return which set the key belongs to if it is > > found. > > > > Since each set requires a BF, vBF should be used when set count is small. > > vBF's false positive rate could be set appropriately so that its memory > > requirement and lookup speed is better in certain cases comparing to HT > > based set-summary. > > > > This patch adds the vBF implementation. > > > > [1]B H Bloom, “Space/Time Trade-offs in Hash Coding with Allowable > > Errors,” Communications of the ACM, 1970. > > > > Signed-off-by: Yipeng Wang <yipeng1.w...@intel.com> > > ... > > > diff --git a/lib/librte_member/rte_member_vbf.c > > b/lib/librte_member/rte_member_vbf.c > > ... > > > +int > > +rte_member_create_vbf(struct rte_member_setsum *ss, > > + const struct rte_member_parameters *params) { > > + > > + if (params->num_set > 32 || !rte_is_power_of_2(params- > > >num_set) || > > Magic number. Define a macro instead. > > > + params->num_keys == 0 || > > + params->false_positive_rate == 0 || > > + params->false_positive_rate > 1) { > > + rte_errno = EINVAL; > > + RTE_MEMBER_LOG(ERR, "vBF create with invalid > > parameters\n"); > > + return -EINVAL; > > ... > > > + > > + /* > > + * reduce hash function count, until we approach the user specified > > + * false-positive rate. otherwise it is too conservative > > Watch out for capital letters at the start of the comment and after a full > stop. > > > + */ > > + int tmp_num_hash = ss->num_hashes; > > + > > + while (tmp_num_hash > 1) { > > + float tmp_fp = new_fp; > > + > > + tmp_num_hash--; > > + new_fp = pow((1 - pow((1 - 1.0 / ss->bits), > > num_keys_per_bf * > > + tmp_num_hash)), tmp_num_hash); > > + new_fp = 1 - pow((1 - new_fp), ss->num_set); > > + > > + if (new_fp > params->false_positive_rate) { > > + new_fp = tmp_fp; > > + tmp_num_hash++; > > + break; > > + } > > + } > > + > > + ss->num_hashes = tmp_num_hash; > > + > > + RTE_MEMBER_LOG(DEBUG, "vector bloom filter created, " > > + "each bloom filter expects %u keys, needs %u bits, %u > > hashes, " > > + "with false positive rate set as %.5f, " > > + "The new calculated vBF false positive rate is %.5f\n", > > + num_keys_per_bf, ss->bits, ss->num_hashes, x, new_fp); > > Use a more descriptive variable name for "x". > > > + > > + ss->table = rte_zmalloc_socket(NULL, ss->num_set * (ss->bits >> 3), > > + RTE_CACHE_LINE_SIZE, ss- > > >socket_id); > > + > > + /* > > + * To avoid multiplication and division: > > + * mul_shift is used for multiplication shift during bit test > > + * div_shift is used for division shift, to be divided by number of bits > > + * represented by a uint32_t variable > > + */ > > + ss->mul_shift = __builtin_ctzl(ss->num_set); > > + ss->div_shift = __builtin_ctzl(32 >> ss->mul_shift); > > + > > + if (ss->table == NULL) > > + return -ENOMEM; > > I would move this check just after the malloc call. > > > + > > + return 0; > > +} > > + > > +static inline uint32_t > > +test_bit(uint32_t bit_loc, const struct rte_member_setsum *ss) { > > + uint32_t *vbf = ss->table; > > + uint32_t n = ss->num_set; > > + uint32_t div_shift = ss->div_shift; > > + uint32_t mul_shift = ss->mul_shift; > > + /* > > + * a is how many bits in one BF are represented by one 32bit > > + * variable. > > + */ > > + uint32_t a = 32 >> mul_shift; > > + /* > > + * x>>b is the divide, x & (a-1) is the mod, & (1<<n-1) to mask out > > bits > > + * we do not need > > + */ > > + return (vbf[bit_loc>>div_shift] >> ((bit_loc & (a - 1)) << mul_shift)) > > Add spaces around ">>". > > > & > > + ((1ULL << n) - 1); > > +} > > + > > +static inline void > > +set_bit(uint32_t bit_loc, const struct rte_member_setsum *ss, int32_t > > +set) { > > + uint32_t *vbf = ss->table; > > + uint32_t div_shift = ss->div_shift; > > + uint32_t mul_shift = ss->mul_shift; > > + uint32_t a = 32 >> mul_shift; > > + > > + vbf[bit_loc>>div_shift] |= 1U << (((bit_loc & (a - 1)) << mul_shift) + > > + set - 1); > > Same as above. > > > +} > > + > > +int > > +rte_member_lookup_vbf(const struct rte_member_setsum *ss, const > > void *key, > > + member_set_t *set_id) > > +{ > > + uint32_t j; > > + uint32_t h1 = MEMBER_HASH_FUNC(key, ss->key_len, ss- > > >prim_hash_seed); > > + uint32_t h2 = MEMBER_HASH_FUNC(&h1, sizeof(uint32_t), > > + ss->sec_hash_seed); > > + uint32_t mask = ~0; > > + uint32_t bit_loc; > > + > > + for (j = 0; j < ss->num_hashes; j++) { > > + bit_loc = (h1 + j * h2) & ss->bit_mask; > > + mask &= test_bit(bit_loc, ss); > > + } > > + > > + if (mask) { > > + *set_id = __builtin_ctzl(mask) + 1; > > + return 1; > > Wouldn't it be better to return 0 when there is a hit and -ENOENT when > there is not? [Wang, Yipeng] I tried to follow the convention of other lookup functions that the return value Is kind of the number of matches that found. So 1 means found 1 and 0 means not Found. I defined the public API rte_member_lookup to also returns 1 for found.
> > + } > > + > > + *set_id = RTE_MEMBER_NO_MATCH; > > + return 0; > > +} > > + > > +uint32_t > > +rte_member_lookup_bulk_vbf(const struct rte_member_setsum *ss, > > + const void **keys, uint32_t num_keys, member_set_t > > *set_ids) { > > + uint32_t i, k; > > + uint32_t ret = 0; > > Change variable name to "nr_matches/hits" or similar. > Same in the next functions.