Avery Qi created SPARK-51884:
--------------------------------
Summary: Subquery Definition Changes For Adding Support For Nested
Correlated Subqueries
Key: SPARK-51884
URL: https://issues.apache.org/jira/browse/SPARK-51884
Project: Spark
Issue Type: Sub-task
Components: Optimizer, SQL
Affects Versions: 4.1.0
Reporter: Avery Qi
* Add OuterScopeAttrs and related getter and setter methods for
SubqueryExpression
* All attributes in OuterScopeAttrs must be contained in the OuterAttrs
AttributeSet of SubqueryExpression
* Update the usage of SubqueryExpression and classes extending
SubqueryExpression
Spark only supports one layer of correlation now and does not support nested
correlation.
For example,
SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM
VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == MAX(t1.col2)
)GROUP BY col1;
is supported and
SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM
VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == ( SELECT MAX(t1.col2)
)
)GROUP BY col1;
is not supported.
The reason spark does not support it is because the Analyzer and Optimizer
resolves and plans Subquery in a recursive way.
The definition change for the SubqueryExpression adds the metadata
OuterScopeAttrs which helps later rewrites for the Analyzer and Optimizer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]