adriangb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983932272

   @Dandandan the two ways I thought a bloom filter would be advantageous:
   1. More performant if applied to each row than the full hash table, although 
I admit I haven't poked around in HashJoinExec. Even if not faster it could be 
a useful tool to have around to e.g. implement larger than memory joins where 
you only hit disk if the bloom filter passes.
   2. Easier to serialize across the wire (e.g. for remote scans like 
[Liquidcache](https://github.com/XiangpengHao/liquid-cache)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to