AveryQi115 commented on code in PR #50285: URL: https://github.com/apache/spark/pull/50285#discussion_r2054604668
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ########## @@ -46,6 +46,10 @@ import org.apache.spark.sql.types.DataType * relation or as a more complex logical plan in the event of a table subquery. * @param outerAttrs outer references of this subquery plan, generally empty since these table * arguments do not allow correlated references currently + * @param outerScopeAttrs outer references of the subquery plan that cannot be resolved by the Review Comment: Two reasons: 1. `SubqueryExpression.references` are defined as `outerAttrs`. And this reference is used in many places in the spark planner/optimizer. We check if the references can be resolved in the containing operator of the subquery's input. If not, the operator/subquery becomes unresolved. outerScopeAttrs need to be removed from these references as they cannot be resolved by the operator's input. So we need to have this metadata and change the reference of subqueryExpression to be `AttributeSet(outerAttrs) -- AttributeSet(nestedOuterAttrs)`. It is changed in the part1.b pr. 2. For safely adding nested correlations support in the optimizer. This is due to the safety concern and some legacy reasons of the optimizer design. The decorrelation framework in the optimizer now supports one layer of decorrelation, and it is not designed for nested correlations. Changing it to support nested correlations would be hard, but completely remove it and replace it by the nested correlations handling framework might affect current spark users. For safely adding this new feature, we want to maintain two decorrelation frameworks now, they're a bit similar so the maintenance work would be easy. And whether the subquery contains outerScopeAttrs guides the optimizer to choose different decorrelations. It is very hard to determine whether an outer reference can be resolved in the containing query or is a outer scope outer reference. Because due to some existing bugs of DeduplicateRelations and InlineCTE, we might have duplicated exprIds accross subquery plans in the optimizer. Optimizer cannot get the correct information about where this outer reference comes from. So we need this metadata in the analyzer phase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org