GitHub user LiuQhahah added a comment to the discussion: Cuckoo Filter Design
Proposal
This is an excellent and insightful suggestion.
In response to your suggestion, I have carefully considered it.
For those large-capacity cuckoo filters, if we take a one-to-one mapping
between filters and the count of rehashing, the efficiency during subsequent
query insertions and deletions will not be as high as when multiple rocksDB
keys represent sub-filters.
This is a schematic diagram.
```
┌─────────────────────┬────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ RocksDB Key │ Value Description │ Details / Calculation
│
├─────────────────────┼────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ "my_large_cuckoo" │ Serialized CuckooChainMetadata │ Contains:
num_filters=3, num_buckets=2^19, expansion=2, bucket_size=2, etc. │
│ "my_large_cuckoo:0" │ Data block for Sub-filter 0 │ Size: 2^19 buckets * 2
bytes/bucket = 2^20 bytes (1 MiB) │
│ "my_large_cuckoo:1" │ Data block for Sub-filter 1 │ Size: (2^19 2^1)
buckets 2 bytes/bucket = 2^21 bytes (2 MiB) │
│ "my_large_cuckoo:2" │ Data block for Sub-filter 2 │ Size: (2^19 2^2)
buckets 2 bytes/bucket = 2^22 bytes (4 MiB) │
└─────────────────────┴────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘
```
will convert into ->
```
┌───────────────────────┬────────────────────────────────┬──────────────────────────────────────────────────────────────┐
│ RocksDB Key │ Value Description │ Details /
Calculation │
├───────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────────────────┤
│ "my_large_cuckoo" │ Serialized CuckooChainMetadata │ Contains:
num_filters=3, filter_chunks=[1, 2, 4], etc. │
│ │ │
│
│ "my_large_cuckoo:0" │ Data block for Sub-filter 0 │ Size: 1 MiB. (Not
split as it doesn't exceed the threshold). │
│ │ │
│
│ "my_large_cuckoo:1:0" │ First half of Sub-filter 1 │ Size: 1 MiB.
Contains buckets 0 to 2^19 - 1. │
│ "my_large_cuckoo:1:1" │ Second half of Sub-filter 1 │ Size: 1 MiB.
Contains buckets 2^19 to 2^20 - 1. │
│ │ │
│
│ "my_large_cuckoo:2:0" │ First quarter of Sub-filter 2 │ Size: 1 MiB.
Contains buckets 0 to 2^19 - 1. │
│ "my_large_cuckoo:2:1" │ Second quarter of Sub-filter 2 │ Size: 1 MiB.
Contains buckets 2^19 to 2^20 - 1. │
│ "my_large_cuckoo:2:2" │ Third quarter of Sub-filter 2 │ Size: 1 MiB.
Contains buckets 2^20 to (2^20 + 2^19) - 1. │
│ "my_large_cuckoo:2:3" │ Fourth quarter of Sub-filter 2 │ Size: 1 MiB.
Contains buckets (2^20 + 2^19) to 2^21 - 1. │
└───────────────────────┴────────────────────────────────┴──────────────────────────────────────────────────────────────┘
```
GitHub link:
https://github.com/apache/kvrocks/discussions/3079#discussioncomment-13918799
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]