mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983844232

   So the high level for Spark is that there’s a BloomFilterAgg aggregate 
function that returns a byte sequence representing the bloom filter. The 
BloomFilterMightContaim scalar function takes the byte sequence and an input to 
test and returns a Boolean.
   
   I’m not sure we’d aim for Spark compatibility or not: we’d have to use the 
same hash function, seed, and they do some weirdness with endianness in the 
bloom filter’s byte sequence. I can point to the relevant code for interest, 
but we may want a different solution for core DF. 
   
   
https://github.com/apache/datafusion-comet/blob/2c604f9ac68f9723f1ae9107ac2864ac1041f6e9/native/core/src/execution/util/spark_bloom_filter.rs#L31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to