Re: [D] Cuckoo Filter Design Proposal [kvrocks]

via GitHub Mon, 28 Jul 2025 00:34:29 -0700


GitHub user PragmaTwice added a comment to the discussion: Cuckoo Filter Design 
Proposal


Thank you for your proposal! Here are some comments after a brief review.

> n_filters | uint32_t | 4 | The number of filter shards in the chain.
> capacity | uint64_t | 8 | The total theoretical capacity of the entire filter 
> chain.

I think we should also store the max number of buckets of each filter. i.e. how 
many buckets is stored in one filter?

Also IMHO `capacity` (if I understand correctly it is estimated number of max 
items to insert) is not so useful and worth to store, i.e. we can compute it 
immediately from other info. 

> The maximum number of kicks allowed during an insertion before giving up.

Before "expansion" (BTW I prefer to use "rehashing" rather than "expansion")? 
Or before "giving up"?

> Data Key 1:
>
>    Key: cf:namespace:cuckoo_filter:my_filter:1
>    Value: (Binary data for the second shard, capacity 2000)

1. we don't need to create a rocksdb key-value if all buckets in it are empty.
2. we should have the same number of buckets for each "filter" here. e.g. if 
the first "filter" has 1000 buckets, the second should have same. If you want 
more 2000 buckets you should create 2 more "filter"s.
3. The bucket index can be computed via `filter_index * n_buckets_in_filter`.

> Locate Last Shard: Identify the last filter shard using num_filters - 1.

All filters should be useful. If we only care about the last filter, I think we 
should delete all other filters.

Here we should locate TWO filters (maybe one, if the two buckets are in the 
same filter) via its two buckets (located by its two hash values).

GitHub link: 
https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13906830

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Cuckoo Filter Design Proposal [kvrocks]

Reply via email to