Re: [I] Add BloomFilter PhysicalExpr [datafusion]

via GitHub Wed, 18 Jun 2025 04:57:13 -0700


mbutrovich commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983844232


   So the high level for Spark is that there’s a BloomFilterAgg aggregate 
function that returns a byte sequence representing the bloom filter. The 
BloomFilterMightContaim scalar function takes the byte sequence and an input to 
test and returns a Boolean.
   
   I’m not sure we’d aim for Spark compatibility or not: we’d have to use the 
same hash function, seed, and they do some weirdness with endianness in the 
bloom filter’s byte sequence. I can point to the relevant code for interest, 
but we may want a different solution for core DF. 
   
   
https://github.com/apache/datafusion-comet/blob/2c604f9ac68f9723f1ae9107ac2864ac1041f6e9/native/core/src/execution/util/spark_bloom_filter.rs#L31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Add BloomFilter PhysicalExpr [datafusion]

Reply via email to