GitHub user LiuQhahah added a comment to the discussion: Cuckoo Filter Design 
Proposal

 >I think we should also store the max number of buckets of each filter. i.e. 
 >how many buckets is stored in one filter? 
> Also IMHO capacity (if I understand correctly it is estimated number of max 
> items to insert) is not so useful and worth to store, i.e. we can compute it 
> immediately from other info.


I'm very sorry for misleading. Let me clarify again that "capacity" refers to 
the capacity of a single filter rather than the entire filter chain. I think 
using "capacity" to represent a single filter can solve the problem that you 
need to specify the capacity for each filter and do not need to set the 
capacity for the entire filter chain

> Before "expansion" (BTW I prefer to use "rehashing" rather than "expansion")? 
> Or before "giving up"?

Thank you for your suggestion. I have updated "before giving up" to "before 
rehashing".


> we don't need to create a rocksdb key-value if all buckets in it are empty.

Yes, for empty filters, key-value pairs should not be inserted into rocks.


> Data Key 1:

> Key: cf:namespace:cuckoo_filter:my_filter:1
> Value: (Binary data for the second shard, capacity 2000)

"Data Key 1" refers to the filter after cf has undergone one 
reshading(expansion). At this point, the key of shard1 becomes 
cf:namespace:cuckoo_filter:my_filter:1, and the capacity in the Value is 2000. 
Data Key 1 and Data Key 0 refer to two different filters rather than the same 
filter. 


>  Locate Last Shard: Identify the last filter shard using num_filters - 1.

Thank you for pointing out the problem. The description of the add operation is 
inaccurate. The correct ADD operation is as follows. I have updated the document

   1. Read Metadata: Read the metadata for the given user_key.
   2. Calculate the two candidate bucket locations ((filter_idx1, 
bucket_in_filter1) and (filter_idx2, bucket_in_filter2)).
   3. Read the data for the corresponding 1 or 2 shards from RocksDB
   4. Attempt to insert the fingerprint into an empty slot in either bucket.
   5. If both are full, begin the cuckoo kicking process, which may involve 
reading/writing to different shards as items are moved.
   6. If insertion succeeds, increment size in metadata and atomically write 
the updated metadata and modified shard(s) back to RocksDB.
   7. If kicking exceeds max_iterations, the insertion fails. Trigger the 
Reshading process.



If you still have any questions, please feel free to tell me. Thank you



GitHub link: 
https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13909639

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to