Dandandan commented on PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3725167940

   > > So I guess the main factor is expressions like this being super 
expensive to evaluate (query 9):
   > 
   > I wonder if it's the expression being expensive to evaluate or if 
evaluating it where it is currently causes the issue. That is, if this was 
evaluated in a `FilterExec` right before the `HashJoin -> RepartitionExec` (and 
thus lifted work out of the hash join) would it perform better? We should also 
try with `SET datafusion.optimizer.hash_join_inlist_pushdown_max_size = 0`.
   > 
   > > A run with join filter pushdown disabled and 
DATAFUSION_OPTIMIZER_REPARTITION_FILE_MIN_SIZE = 128 * 1024 shows almost no 
regression for tpch
   > 
   > I guess we need to test both of those to understand how each one impacts 
results...
   
   I tested both, the join filter pushdown has the dramatic (bad) impact on 
tpch performance, making it overall slightly better than without predicate 
pushdown , the DATAFUSION_OPTIMIZER_REPARTITION_FILE_MIN_SIZE only has minimal 
impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to