Dandandan commented on PR #16445:
URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2985881381

   > > I think doing only the lookup is preferable above also computing / 
checking the bounds, I think the latter might create more overhead
   > 
   > My thought was that for some cases the bounds checks are going to be quite 
effective at pruning and they should always be cheap to compute and cheap to 
apply. I'm surprised you say that they might create a lot of overhead?
   
   Maybe I should articulate it a bit more.
   
   * If we are only filtering out based on statistics, min/max might make sense 
to quickly filter out. 
   * If we are filtering on values (e.g. filter pushdown) - I think it makes 
sense to *only* filter on the shared hashmap and not bothering with the min/max 
values - creating hashes and doing a single table lookup is relatively slow, so 
I think we want to avoid *also* evaluating the min/max expression.
   
   I think it also makes sense to also thing about a heuristic we want to use 
to use this pushdown only when we think it might be useful - e.g. the left side 
is much smaller than the right side, or we know (based on column statistics) it 
will fiflter out rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to