[jira] [Created] (SPARK-51884) Subquery Definition Changes For Adding Support For Nested Correlated Subqueries

Avery Qi (Jira) Wed, 23 Apr 2025 11:34:27 -0700

Avery Qi created SPARK-51884:
--------------------------------

             Summary: Subquery Definition Changes For Adding Support For Nested 
Correlated Subqueries
                 Key: SPARK-51884
                 URL: https://issues.apache.org/jira/browse/SPARK-51884
             Project: Spark
          Issue Type: Sub-task
          Components: Optimizer, SQL
    Affects Versions: 4.1.0
            Reporter: Avery Qi



* Add OuterScopeAttrs and related getter and setter methods for 
SubqueryExpression
 * All attributes in OuterScopeAttrs must be contained in the OuterAttrs 
AttributeSet of SubqueryExpression
 * Update the usage of SubqueryExpression and classes extending 
SubqueryExpression



Spark only supports one layer of correlation now and does not support nested 
correlation.
For example,
SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM 
VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == MAX(t1.col2)
)GROUP BY col1;
 
is supported and
SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM 
VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == (   SELECT MAX(t1.col2)
 )
)GROUP BY col1;
 
is not supported.

The reason spark does not support it is because the Analyzer and Optimizer 
resolves and plans Subquery in a recursive way.

The definition change for the SubqueryExpression adds the metadata 
OuterScopeAttrs which helps later rewrites for the Analyzer and Optimizer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51884) Subquery Definition Changes For Adding Support For Nested Correlated Subqueries

Reply via email to