Dandandan commented on PR #19639: URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3725242067
> I wonder if it's the expression being expensive to evaluate or if evaluating it where it is currently causes the issue. That is, if this was evaluated in a FilterExec right before the HashJoin -> RepartitionExec (and thus lifted work out of the hash join) would it perform better? We should also try with SET datafusion.optimizer.hash_join_inlist_pushdown_max_size = 0. I am not sure that would help, I guess join pushdown mostly helps if a lot of IO can be avoided and the expression is not super expensive to evaluation, which I think won't be better by moving it inside `FilterExec`. I guess we need some more heuristics / adaptiveness here as well to only apply it on beneficial cases (and perhaps further reduce the cost of evaluating the expressions). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
