Avery Qi created SPARK-51884: -------------------------------- Summary: Subquery Definition Changes For Adding Support For Nested Correlated Subqueries Key: SPARK-51884 URL: https://issues.apache.org/jira/browse/SPARK-51884 Project: Spark Issue Type: Sub-task Components: Optimizer, SQL Affects Versions: 4.1.0 Reporter: Avery Qi
* Add OuterScopeAttrs and related getter and setter methods for SubqueryExpression * All attributes in OuterScopeAttrs must be contained in the OuterAttrs AttributeSet of SubqueryExpression * Update the usage of SubqueryExpression and classes extending SubqueryExpression Spark only supports one layer of correlation now and does not support nested correlation. For example, SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == MAX(t1.col2) )GROUP BY col1; is supported and SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == ( SELECT MAX(t1.col2) ) )GROUP BY col1; is not supported. The reason spark does not support it is because the Analyzer and Optimizer resolves and plans Subquery in a recursive way. The definition change for the SubqueryExpression adds the metadata OuterScopeAttrs which helps later rewrites for the Analyzer and Optimizer. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org