Re: [PR] feat: Implement bloom_filter_agg [datafusion-comet]

via GitHub Fri, 18 Oct 2024 07:55:36 -0700


mbutrovich commented on code in PR #987:
URL: https://github.com/apache/datafusion-comet/pull/987#discussion_r1806642052



##########
native/core/src/execution/datafusion/util/spark_bloom_filter.rs:
##########
@@ -54,6 +78,25 @@ impl SparkBloomFilter {
             num_hash_functions: num_hash_functions as u32,
         }
     }
+}
+
+impl SparkBloomFilter {
+    /// Serializes a SparkBloomFilter to a byte array conforming to Spark's 
BloomFilter
+    /// binary format version 1.
+    pub fn spark_serialization(&self) -> Vec<u8> {
+        // There might be a more efficient way to do this, even with all the 
endianness stuff.
+        let mut spark_bloom_filter: Vec<u8> = 1_u32.to_be_bytes().to_vec();
+        spark_bloom_filter.append(&mut 
self.num_hash_functions.to_be_bytes().to_vec());
+        spark_bloom_filter.append(&mut (self.bits.word_size() as 
u32).to_be_bytes().to_vec());
+        let mut filter_state: Vec<u64> = self.bits.data();
+        for i in filter_state.iter_mut() {
+            *i = i.to_be();
+        }
+        // Does it make sense to do a std::mem::take of filter_state here? 
Unclear to me if a deep
+        // copy of filter_state as a Vec<u64> to a Vec<u8> is happening here.

Review Comment:
   `filter_state` isn't needed after this function call, so ideally I'd be able 
to move its contents out instead of the byte slice, but because the underlying 
type of `filter_state` is `u64` vector and I'm stuffing it into a `u8` vector I 
think the alignment requirements won't be the same and I won't get the clean 
buffer transfer that I want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Implement bloom_filter_agg [datafusion-comet]

Reply via email to