Re: [I] Add BloomFilter PhysicalExpr [datafusion]

via GitHub Wed, 18 Jun 2025 05:08:12 -0700


adriangb commented on issue #16435:
URL: https://github.com/apache/datafusion/issues/16435#issuecomment-2983932272


   @Dandandan the two ways I thought a bloom filter would be advantageous:
   1. More performant if applied to each row than the full hash table, although 
I admit I haven't poked around in HashJoinExec. Even if not faster it could be 
a useful tool to have around to e.g. implement larger than memory joins where 
you only hit disk if the bloom filter passes.
   2. Easier to serialize across the wire (e.g. for remote scans like 
[Liquidcache](https://github.com/XiangpengHao/liquid-cache)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add BloomFilter PhysicalExpr [datafusion]

Reply via email to