duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2798943345
I think we can break down this story into multiple step: 1. unify the optimizor for correlated query, regardless the query type (exists query, scalar query etc) 2. support flexible decorrelation scheme (simple vs general approach), we can achieve this by following the algorithm mentioned in the [2nd paper](https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf). To achieve this, there is a prerequisite to introduce an index algebra during the rewrite. This index requires a pre-traversing over the whole query to detect all non-trivial subquery, and answer the question whether simple unnesting is sufficient, or should the framework continue with the general approach 3. Implement general purpose + recursive aware subquery decorrelation for the most major operators (projection, filter, group by) using the top-down algorithm mentioned in the 2nd paper 4. Gradually support more complex expression (group by, order, limit, window function) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org