2010YOUY01 commented on PR #16660:
URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3178537332

   > @2010YOUY01 Wouldn't you need to use a cost model to estimate which one to 
use though when both are viable? For example, the Hash Join (do the 
equi-condition then the residual filter) vs. PWMJ (do the filter, then the equi 
condition residual). You could estimate the selectivity for the equi predicate 
vs. residual predicate, factor in whether the key is sorted, etc. for making 
the decision. Sorry if i misinterpreted this, thanks for bearing with me.
   
   Ah, I get it now. How about using the following simple heuristic:
   
   If predicate contains equality check: e.g. `(t1.c1 = t1.c1) AND (t1.c2 > 
t2.c2)` --> Hash Join
   Else if predicate contains inequality check: e.g.`(t1.c1 > t1.c1) AND 
((t1.c2 + t2.c2)%10 = 1)` --> PWMJ
   Otherwise --> NLJ
   
   I was thinking PWMJ should cover more cases originally handled by NLJ, since 
in the general case it should be faster than NLJ. I don't want to implement 
some cost model beyond the above rule at the moment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to