Dandandan commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2985881381
> > I think doing only the lookup is preferable above also computing / checking the bounds, I think the latter might create more overhead > > My thought was that for some cases the bounds checks are going to be quite effective at pruning and they should always be cheap to compute and cheap to apply. I'm surprised you say that they might create a lot of overhead? Maybe I should articulate it a bit more. * If we are only filtering out based on statistics, min/max might make sense to quickly filter out. * If we are filtering on values (e.g. filter pushdown) - I think it makes sense to *only* filter on the shared hashmap and not bothering with the min/max values - creating hashes and doing a single table lookup is relatively slow, so I think we want to avoid *also* evaluating the min/max expression. I think it also makes sense to also thing about a heuristic we want to use to use this pushdown only when we think it might be useful - e.g. the left side is much smaller than the right side, or we know (based on column statistics) it will fiflter out rows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org