Re: [D] Cuckoo Filter Design Proposal [kvrocks]

via GitHub Mon, 25 Aug 2025 23:30:26 -0700


GitHub user LiuQhahah edited a discussion: Cuckoo Filter Design Proposal




  # 1. Introduction

  In https://github.com/apache/kvrocks/issues/2534, we plan to support the 
cuckoo filter.

  This document outlines the design for implementing a Cuckoo Filter in 
kvrocks. The Cuckoo Filter is a high-performance, probabilistic data structure 
used for approximate set membership testing. It allows for fast checks to see 
if an item is in a set and, unlike Bloom Filters, supports item deletion.

  A key requirement for this implementation is to support the counting of 
duplicate items (via the CF.COUNT command). To robustly handle scenarios where 
a single item may be inserted many more times than the capacity of its two 
candidate buckets, we have chosen a sharded (or chained) expansion strategy. 
This approach ensures functional correctness and system stability under all 
conditions, accepting a trade-off in query performance degradation in favor of 
feature completeness and robust handling of high-frequency items.

 # 2. Core Concepts

   * Fingerprint: Instead of storing the items themselves, the Cuckoo Filter 
stores a "fingerprint" of the item, which is a small hash derived from the 
original item.
   * Two Candidate Buckets: For any given item, it can be stored in one of two 
possible buckets within a given sub-filter. The locations of these buckets are 
determined by two hash functions.
   * Kicking: When a new item needs to be inserted and both of its candidate 
buckets are full, the filter will evict an existing item from one of the 
buckets to make space.
     This evicted item is then re-inserted into its alternate location, which 
may cause a cascade of evictions. This process continues until all items find a 
home or a maximum number of evictions is reached.
   * Expansion (Chaining): If the kicking process fails to find a home for an 
item after a set number of attempts, the current sub-filter is considered full. 
An expansion is then triggered, which appends a new, larger sub-filter to the 
chain to accommodate more data. This design does not use global rehashing.

 #  3. Data Structure Design

  To efficiently implement the Cuckoo Filter in kvrocks, we will adopt a 
"Bucket as Key" storage model, combined with a sharded (or chained) expansion 
strategy. This model maps each bucket of each sub-filter to an individual 
key-value pair in RocksDB, ensuring high I/O performance and cache efficiency.

## 3.1. Metadata

  The metadata stores the overall configuration for the Cuckoo Filter chain. 
Each Cuckoo Filter instance corresponds to a single metadata key-value pair in 
RocksDB.

   * Key Format: namespace:cuckoo_filter:{user_key}
   * Value Format: A serialized CuckooFilterMetadata object.

### CuckooFilterMetadata Structure



| Field Name  | Data Type  | Size (Bytes)  |  Description |
|---|---|---|---|
| n_filters  | uint32_t  | 4  | The number of sub-filters in the chain.  |   
| capacities  | uint64_t[]  | Variable  |  An array storing the bucket count 
for each sub-filter. capacities[i] is the number of buckets for the sub-filter 
at filter_index=i. |   
| size  | uint64_t  |  8 | The total number of items stored in the entire 
filter chain.  |   
| max_iterations  | uint16_t  |  2 | The maximum number of kicks (evictions) 
allowed during an insertion before triggering an expansion.Default: 20  |  
| bucket_size  |  uint8_t | 1  | The number of fingerprints (slots) each bucket 
can hold.  Default: 2  |  
| expansion  |  uint8_t | 1  | The growth factor for the capacity of a new 
sub-filter relative to the previous one.  Default: 1  |  




## 3.2. Bucket Data

Each bucket within each sub-filter is stored as a separate key-value pair. This 
granular approach is the key to performance.

* Key Format: cf:{namespace_key}:{filter_index}:{bucket_index}
    * cf: A fixed prefix to identify Cuckoo Filter data.
    * {namespace_key}: The same user key as used for the metadata.
    * {filter_index}: The 0-based index of the sub-filter in the chain.
    * {bucket_index}: The 0-based index of the bucket within that sub-filter.

* Value Format:
    * A binary string containing the concatenated fingerprints for that bucket. 
This is a very small value (e.g., 4 bytes).

## 3.3. Storage Layout in RocksDB

The following diagram illustrates the storage layout for a Cuckoo Filter that 
has expanded once (n_filters = 2). It details both the Metadata entry and the 
structure of the individual "Bucket as Key" entries.

```

+-----------------------------------------------------------------------------------+
|                          RocksDB Storage (Bucket as Key + Chaining)           
    |
+-----------------------------------------------------------------------------------+
|                                                                               
    |
|   
+---------------------------------------------------------------------------+   
|
|   |                                Metadata Entry                             
|   |
|   |   Key: "namespace:cuckoo_filter:my_filter"                                
|   |
|   |   Value: (Serialized Data)                                                
|   |
|   |     {                                                                     
|   |
|   |       "n_filters": 2,                                                     
|   |
|   |       "capacities": [1024, 2048], // -> Filter 0 has 1024 buckets,        
|   |
|   |                                  //    Filter 1 has 2048 buckets.         
|   |
|   |       "size": 1500,                                                       
|   |
|   |       "max_iterations": 20,                                               
|   |
|   |       "bucket_size": 4,    // -> Each bucket holds 4 fingerprints         
|   |
|   |       "expansion": 2,                                                     
|   |
|   |     }                                                                     
|   |
|   
+---------------------------------------------------------------------------+   
|
|                                                                               
    |
|   
+---------------------------------------------------------------------------+   
|
|   |                         Buckets for Filter #0                             
|   |
|   |                                                                           
|   |
|   |   Key: "cf:...:my_filter:0:123"  // Example bucket from Filter #0         
|   |
|   |                                                                           
|   |
|   |   Value: (4 bytes of binary data)                                         
|   |
|   |     +----------+----------+----------+----------+                         
|   |
|   |     |   fpA    |   fpB    |   fpC    |  (empty) |                         
|   |
|   |     +----------+----------+----------+----------+                         
|   |
|   |     |  Slot 0  |  Slot 1  |  Slot 2  |  Slot 3  |                         
|   |
|   |     (1 byte)   (1 byte)   (1 byte)   (1 byte)                             
|   |
|   |                                                                           
|   |
|   
+---------------------------------------------------------------------------+   
|
|                                                                               
    |
|   
+---------------------------------------------------------------------------+   
|
|   |                         Buckets for Filter #1                             
|   |
|   |                                                                           
|   |
|   |   Key: "cf:...:my_filter:1:456"  // Example bucket from Filter #1         
|   |
|   |                                                                           
|   |
|   |   Value: (4 bytes of binary data)                                         
|   |
|   |     +----------+----------+----------+----------+                         
|   |
|   |     |   fpX    |   fpY    |   fpX    |   fpZ    |                         
|   |
|   |     +----------+----------+----------+----------+                         
|   |
|   |     |  Slot 0  |  Slot 1  |  Slot 2  |  Slot 3  |                         
|   |
|   |                                                                           
|   |
|   
+---------------------------------------------------------------------------+   
|
|                                                                               
    |
+-----------------------------------------------------------------------------------+

```



  # 4. Operational Flow

 ## 4.1. Add Operation (CF.ADD)

 1. Read Metadata: Read the metadata for the given user_key to get n_filters 
and the capacities array.
 2. Target Last Filter: Target the last sub-filter, which has an index of 
n_filters - 1.
 3. Calculate Buckets: Based on the parameters of the last sub-filter (i.e., 
capacities[n_filters - 1]), calculate the two candidate bucket indices, i1 and 
i2, for the item.
 4. Read Buckets: Use MultiGet to read the two corresponding bucket keys: 
...:{n_filters-1}:{i1} and ...:{n_filters-1}:{i2}.
 5. Attempt Insert: Try to place the item's fingerprint into an empty slot in 
either bucket. If successful, proceed to step 7.
 6. Cuckoo Kicking: If both buckets are full, initiate the kicking process 
within the current sub-filter.
 7. On Success: If the insertion is successful, increment the size in the 
metadata. Atomically write the updated metadata and all modified bucket data to 
RocksDB using a single WriteBatch.
 8. On Failure: If the kicking process exceeds max_iterations, trigger the 
Expansion flow.

## 4.1.1. Internal Expansion Flow

1. Update Metadata:  Calculate the capacity for the new sub-filter based on the 
expansion factor and append it to the capacities array.
2. Create New Filter: Write the updated metadata to a WriteBatch and commit it. 
This logically creates the new, empty sub-filter.
3. Retry Insert: Re-attempt the original ADD operation. It will now naturally 
target the newly created empty sub-filter and succeed.

 ## 4.3. Exists Operation (CF.EXISTS)

1. Read Metadata: Read the metadata to get n_filters and capacities.
2. Iterate Through Filters: Loop from filter_index = 0 to n_filters - 1.
3. Inside the loop, for each filter_index:
    * Calculate the two candidate bucket indices (i1, i2) for the item based on 
the current sub-filter's capacity (capacities[filter_index]).
    * Use MultiGet to read the two bucket keys: ...:{filter_index}:{i1} and 
...:{filter_index}:{i2}.
    * If the item's fingerprint is found in either bucket, return `True` 
immediately and terminate the operation.
4. Return Result: If the loop completes without finding the item, return False.

## 4.4. Count Operation (CF.COUNT)

This operation is the primary reason for choosing the chained expansion 
strategy, as it correctly handles high-frequency duplicate items.

1. Read Metadata: Read the metadata to get n_filters and capacities.
2. Initialize Counter: Set total_count = 0.
3. Iterate Through Filters:
    * Loop from filter_index = 0 to n_filters - 1.
    * Inside the loop, for each filter_index:
        * Calculate the two candidate bucket indices (i1, i2) for the item 
based on the current sub-filter's capacity.
        * Use MultiGet to read the two corresponding buckets.
        * In memory, iterate through the contents of both buckets and add the 
number of matching fingerprints to total_count.
4. Return Result: After the loop completes, return the final total_count.

## 4.5. Delete Operation (CF.DELETE)

The delete operation is similar to EXISTS but only needs to find and remove the 
first occurrence of the item.

1. Read Metadata and Iterate Through Filters as in EXISTS.
2. In the loop, upon finding the first matching fingerprint in a bucket:
    * Remove the fingerprint from the bucket's value data.
    * Decrement the size in the metadata.
    * Atomically write the modified bucket data and the updated metadata to a 
WriteBatch.
    * Return `True` immediately, and terminate the operation.
3. Return Result: If the loop completes without finding the item, return False.

  # 5. Disadvantages and Trade-offs

This design is a pragmatic compromise, and it's important to understand its 
inherent limitations.

* Query Performance Degradation: The most significant trade-off is that the 
performance of read operations (EXISTS, COUNT, DELETE) degrades linearly as the 
number of sub-filters (n_filters) increases. Each expansion adds another set of 
lookups to every query.
* Low Utilization of Older Sub-Filters: The "write-to-tail" strategy means that 
once a sub-filter triggers an expansion, it becomes "frozen." It will never 
receive new insertions, even  if space is freed up by DELETE operations. This 
can lead to poor overall space utilization, as older sub-filters may become 
sparse over time while new ones are created.
* Hotspot-Driven Expansion: As identified, frequent insertion of a single, 
high-cardinality item can rapidly trigger multiple expansions. This creates 
numerous sub-filters that are largely dedicated to holding this single item, 
further exacerbating the space utilization problem. To mitigate this, it is 
highly recommended to use a larger `bucket_size` (e.g., 4 or ) instead of the 
default of 2 if this usage pattern is anticipated. A larger bucket size 
increases the capacity for duplicate items within a single sub-filter, delaying 
the need for expansion.

  # 6. Concurrency Safety

 * Atomic Writes: All write operations must be atomic to ensure data 
consistency. This is achieved by using RocksDB's WriteBatch.
 * Locking: To prevent race conditions during a read-modify-write cycle, a lock 
should be acquired at the application level for the user key. This is 
especially critical during expansion to ensure that concurrent operations do 
not interfere with each other.

 # 7. Reference

   1. 
https://redis.io/docs/latest/develop/data-types/probabilistic/cuckoo-filter/
   2. https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

GitHub link: https://github.com/apache/kvrocks/discussions/3079

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Cuckoo Filter Design Proposal [kvrocks]

Reply via email to