adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2985988638
> I think it makes sense to only filter on the shared hashmap and not bothering with the min/max values - creating hashes and doing a single table lookup is quite fast, so I think we want to avoid to also evaluate the min/max expression (at least for all rows) I'm surprised that the hash table lookup, even if O(1), has such a small constant factor that its ~ a couple of binary comparisons. That said a reason to still do both is stats and filter caching: simple filters like `col >= 123 and col <= 456` can be used for stats pruning and can easily be cached (for example for [filter caching based indexing](https://github.com/apache/datafusion/issues/15585)). So even if performance is not strictly better there is still something to be said for including a simple filter in addition to the hash table lookup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org