GitHub user PragmaTwice added a comment to the discussion: Cuckoo Filter Design Proposal
Thank you for your proposal! Here are some comments after a brief review. > n_filters | uint32_t | 4 | The number of filter shards in the chain. > capacity | uint64_t | 8 | The total theoretical capacity of the entire filter > chain. I think we should also store the max number of buckets of each filter. i.e. how many buckets is stored in one filter? Also IMHO `capacity` (if I understand correctly it is estimated number of max items to insert) is not so useful and worth to store, i.e. we can compute it immediately from other info. > The maximum number of kicks allowed during an insertion before giving up. Before "expansion" (BTW I prefer to use "rehashing" rather than "expansion")? Or before "giving up"? > Data Key 1: > > Key: cf:namespace:cuckoo_filter:my_filter:1 > Value: (Binary data for the second shard, capacity 2000) 1. we don't need to create a rocksdb key-value if all buckets in it are empty. 2. we should have the same number of buckets for each "filter" here. e.g. if the first "filter" has 1000 buckets, the second should have same. If you want more 2000 buckets you should create 2 more "filter"s. 3. The bucket index can be computed via `filter_index * n_buckets_in_filter`. > Locate Last Shard: Identify the last filter shard using num_filters - 1. All filters should be useful. If we only care about the last filter, I think we should delete all other filters. Here we should locate TWO filters (maybe one, if the two buckets are in the same filter) via its two buckets (located by its two hash values). GitHub link: https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13906830 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
