adriangb commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2985988638

   > I think it makes sense to only filter on the shared hashmap and not 
bothering with the min/max values - creating hashes and doing a single table 
lookup is quite fast, so I think we want to avoid to also evaluate the min/max 
expression (at least for all rows)
   
   I'm surprised that the hash table lookup, even if O(1), has such a small 
constant factor that its ~ a couple of binary comparisons. That said a reason 
to still do both is stats and filter caching: simple filters like `col >= 123 
and col <= 456` can be used for stats pruning and can easily be cached (for 
example for [filter caching based 
indexing](https://github.com/apache/datafusion/issues/15585)). So even if 
performance is not strictly better there is still something to be said for 
including a simple filter in addition to the hash table lookup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to