adriangb commented on issue #16435: URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983932272
@Dandandan the two ways I thought a bloom filter would be advantageous: 1. More performant if applied to each row than the full hash table, although I admit I haven't poked around in HashJoinExec. Even if not faster it could be a useful tool to have around to e.g. implement larger than memory joins where you only hit disk if the bloom filter passes. 2. Easier to serialize across the wire (e.g. for remote scans like [Liquidcache](https://github.com/XiangpengHao/liquid-cache)). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org