skyzh commented on PR #14595:
URL: https://github.com/apache/datafusion/pull/14595#issuecomment-2661674964

   And ready for review again :)
   
   After trying understanding what's happening in `decorrelate.rs`, I think we 
need new code path to support a variety of logical plans produced by lateral 
joins. The key is that we should make the decorrelation code aware of the joins 
while de-correlating, instead of first gathering the information and then 
generate a join at the top.
   
   If we are going towards making the optimizer able to unnest any subqueries 
and lateral joins, then we will likely have a meta rule that recursively apply 
the following rules top-down:
   
   * Convert join operators into LogicalApply if the right side of the join 
contains outer column reference. This can also be done in the SQL->logical 
phase when we encounter a lateral join. For the correlated filter predicate and 
scalar subqueries (exists/in), we can also convert them into the apply operator 
in the future.
   * Have a set of rules like: push down apply->join, push down 
apply->aggregation, push down apply->filter, etc.
   * Apply these rules top-down until no outer column reference is in the plan 
tree.
   
   We can either use the Hyper unnesting rules (we implemented it in CMU-DB's 
[optd](https://github.com/cmu-db/optd-original/blob/main/optd-datafusion-repr/src/rules/subquery/depjoin_pushdown.rs)
 optimizer) or the SQL server unnesting rules (which we've implemented in 
[risinglight](https://github.com/risinglightdb/risinglight/blob/f12ea232a502b1dbda37ddaa3e98c3b8d1e6439b/src/planner/rules/plan.rs#L204-L280)).
   
   This meta unnesting rule is more powerful than what we have right now 
(decorrelate predicate subquery + scalar subquery unnesting rule) and we can 
eventually replace these two rules with the new meta unnesting rule in the 
future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to