Dandandan commented on PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3725242067

   > I wonder if it's the expression being expensive to evaluate or if 
evaluating it where it is currently causes the issue. That is, if this was 
evaluated in a FilterExec right before the HashJoin -> RepartitionExec (and 
thus lifted work out of the hash join) would it perform better? We should also 
try with SET datafusion.optimizer.hash_join_inlist_pushdown_max_size = 0.
   
   I am not sure that would help, I guess join pushdown mostly helps if a lot 
of IO can be avoided and the expression is not super expensive to evaluation, 
which I think won't be better by moving it inside `FilterExec`. I guess we need 
some more heuristics / adaptiveness here as well to only apply it on beneficial 
cases (and perhaps further reduce the cost of evaluating the expressions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to