mihailoale-db commented on code in PR #50769: URL: https://github.com/apache/spark/pull/50769#discussion_r2068621838
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala: ########## @@ -329,4 +332,17 @@ class AggregateResolver(operatorResolver: Resolver, expressionResolver: Expressi .candidates .isEmpty } + + private def getGroupungAttributeIds(aggregate: Aggregate): HashSet[ExprId] = { + val groupingAttributeIds = new HashSet[ExprId](aggregate.groupingExpressions.size) + aggregate.groupingExpressions.foreach { rootExpression => + rootExpression.foreach { + case attribute: AttributeReference => + groupingAttributeIds.add(attribute.exprId) + case _ => + } + } + + groupingAttributeIds + } Review Comment: Is there a reason why we introduce another pass to collect those attributes? This could easily be done in single-pass manner (by adding resolved attributes from `ExpressionResolver` to some HashSet if `resolvingGroupingExpressions` flag is set in current `ExpressionResolutionContext`). ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -403,6 +414,22 @@ class NameScope( * * {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}} * + * In case we are resolving names in expression trees from HAVING or ORDER BY on top of + * [[Aggregate]], we are able to resolve hidden attributes only if theose are present in Review Comment: either ```suggestion * In case we are resolving names in expression trees from [[Having]] or [[Sort]] on top of * [[Aggregate]], we are able to resolve hidden attributes only if theose are present in ``` or ```suggestion * In case we are resolving names in expression trees from HAVING or ORDER BY on top of * GROUP BY, we are able to resolve hidden attributes only if theose are present in ``` sound better (for consistency). ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -712,13 +745,21 @@ class NameScopeStack extends SQLConfHelper { * was nullable in either old hidden output or new output, it must stay nullable in new hidden * output as well. */ - def overwriteOutputAndExtendHiddenOutput(output: Seq[Attribute]): Unit = { + def overwriteOutputAndExtendHiddenOutput( Review Comment: Please update scala doc here (instead of `updates nullabilities` it should be `updates properties` and a bit more than that). ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -403,6 +414,22 @@ class NameScope( * * {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}} * + * In case we are resolving names in expression trees from HAVING or ORDER BY on top of + * [[Aggregate]], we are able to resolve hidden attributes only if theose are present in Review Comment: typo ```suggestion * In case we are resolving names in expression trees from HAVING or ORDER BY on top of * [[Aggregate]], we are able to resolve hidden attributes only if those are present in ``` ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -848,12 +889,14 @@ class NameScopeStack extends SQLConfHelper { multipartName: Seq[String], canLaterallyReferenceColumn: Boolean = true, canReferenceAggregateExpressionAliases: Boolean = false, - canResolveNameByHiddenOutput: Boolean = false): NameTarget = { + canResolveNameByHiddenOutput: Boolean = false, + canReferenceAggregatedAccessOnlyAttributes: Boolean = false): NameTarget = { Review Comment: Could you add a small example here that demonstrates aggregated access only behavior? ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -403,6 +414,22 @@ class NameScope( * * {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}} * + * In case we are resolving names in expression trees from HAVING or ORDER BY on top of + * [[Aggregate]], we are able to resolve hidden attributes only if theose are present in + * grouping expressions, or if the reference itself is under an [[AggregateExpression]]. Review Comment: Please add a query example for the case where sort attribute is present in grouping expressions e.g ``` --this passes SELECT COUNT(col1) FROM t1 GROUP BY col1, col2 ORDER BY col2; ``` ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala: ########## @@ -918,20 +962,45 @@ class NameScopeStack extends SQLConfHelper { } /** - * When the scope gets the new output, we need to refresh nullabilities in its `hiddenOutput`. If - * an attribute is nullable in either old hidden output or new output, it must remain nullable in - * new hidden output as well. + * Update attribute properties when overwriting the current outputs. + * + * 1. When the scope gets the new output, we need to refresh nullabilities in its `hiddenOutput`. + * If an attribute is nullable in either old hidden output or new output, it must remain nullable + * in new hidden output as well. + * + * 2. If we are updating the hidden output on top of an [[Aggregate]], HAVING and ORDER BY clauses Review Comment: Same as above ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/FunctionResolver.scala: ########## @@ -77,9 +77,14 @@ class FunctionResolver( private val typeCoercionResolver: TypeCoercionResolver = new TypeCoercionResolver(timezoneAwareExpressionResolver) + private val expressionResolutionContextStack = + expressionResolver.getExpressionResolutionContextStack /** * Main method used to resolve an [[UnresolvedFunction]]. It resolves it in the following steps: + * - Check if the `unresolvedFunction` is an aggregate expression. Set Review Comment: ```suggestion * - Check if the `unresolvedFunction` is an [[AggregateExpression]]. Set ``` Otherwise it's easy to confuse `AggregateExpression` with one of the `Aggregate.aggregateExpressions`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org