2010YOUY01 commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3178537332
> @2010YOUY01 Wouldn't you need to use a cost model to estimate which one to use though when both are viable? For example, the Hash Join (do the equi-condition then the residual filter) vs. PWMJ (do the filter, then the equi condition residual). You could estimate the selectivity for the equi predicate vs. residual predicate, factor in whether the key is sorted, etc. for making the decision. Sorry if i misinterpreted this, thanks for bearing with me. Ah, I get it now. How about using the following simple heuristic: If predicate contains equality check: e.g. `(t1.c1 = t1.c1) AND (t1.c2 > t2.c2)` --> Hash Join Else if predicate contains inequality check: e.g.`(t1.c1 > t1.c1) AND ((t1.c2 + t2.c2)%10 = 1)` --> PWMJ Otherwise --> NLJ I was thinking PWMJ should cover more cases originally handled by NLJ, since in the general case it should be faster than NLJ. I don't want to implement some cost model beyond the above rule at the moment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org