mbutrovich commented on issue #16435: URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983844232
So the high level for Spark is that there’s a BloomFilterAgg aggregate function that returns a byte sequence representing the bloom filter. The BloomFilterMightContaim scalar function takes the byte sequence and an input to test and returns a Boolean. I’m not sure we’d aim for Spark compatibility or not: we’d have to use the same hash function, seed, and they do some weirdness with endianness in the bloom filter’s byte sequence. I can point to the relevant code for interest, but we may want a different solution for core DF. https://github.com/apache/datafusion-comet/blob/2c604f9ac68f9723f1ae9107ac2864ac1041f6e9/native/core/src/execution/util/spark_bloom_filter.rs#L31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org