2010YOUY01 commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3219835880
I tend to think it's better to include the planner part into the initial PR, the reason is if we do it in two steps, the executor can be incompatible with other operators, so the follow-on PR would also have a large diff. e.g. I think projections are required (like left input has 2 columns `a`, `b`, right input has 2 columns `c`, `d`, the output might only contain `a`, `c`, since `b` and `d` are only used to evaluate join condition but not required in the output), but it's not implemented now. Also, if we have SQL interface to run, there are many existing test cases to cover it, which make it easier to get merged. To make this task easier we want to shrink this PR, here are some ideas - Some preparations to setup the planner can be split to individual PRs? - I think the execution logic for existence joins (semi/anti/mark) is fundamentally different from traditional joins. It might be cleaner to split them into a separate stream implementation -- using a unified execution path can make the state management complex. For the initial PR, we could focus on including only one. ``` PiecewiseMergeJoinExec --existence-join?--> ExistencePWMJStream --not-existence-join?-->TraditionalPWMJStream ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org