mihailoale-db commented on code in PR #50769:
URL: https://github.com/apache/spark/pull/50769#discussion_r2068621838
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/AggregateResolver.scala:
##########
@@ -329,4 +332,17 @@ class AggregateResolver(operatorResolver: Resolver,
expressionResolver: Expressi
.candidates
.isEmpty
}
+
+ private def getGroupungAttributeIds(aggregate: Aggregate): HashSet[ExprId] =
{
+ val groupingAttributeIds = new
HashSet[ExprId](aggregate.groupingExpressions.size)
+ aggregate.groupingExpressions.foreach { rootExpression =>
+ rootExpression.foreach {
+ case attribute: AttributeReference =>
+ groupingAttributeIds.add(attribute.exprId)
+ case _ =>
+ }
+ }
+
+ groupingAttributeIds
+ }
Review Comment:
Is there a reason why we introduce another pass to collect those attributes?
This could easily be done in single-pass manner (by adding resolved
attributes from `ExpressionResolver` to some HashSet if
`resolvingGroupingExpressions` flag is set in current
`ExpressionResolutionContext`).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -403,6 +414,22 @@ class NameScope(
*
* {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}}
*
+ * In case we are resolving names in expression trees from HAVING or ORDER
BY on top of
+ * [[Aggregate]], we are able to resolve hidden attributes only if theose
are present in
Review Comment:
either
```suggestion
* In case we are resolving names in expression trees from [[Having]] or
[[Sort]] on top of
* [[Aggregate]], we are able to resolve hidden attributes only if theose
are present in
```
or
```suggestion
* In case we are resolving names in expression trees from HAVING or ORDER
BY on top of
* GROUP BY, we are able to resolve hidden attributes only if theose are
present in
```
sound better (for consistency).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -712,13 +745,21 @@ class NameScopeStack extends SQLConfHelper {
* was nullable in either old hidden output or new output, it must stay
nullable in new hidden
* output as well.
*/
- def overwriteOutputAndExtendHiddenOutput(output: Seq[Attribute]): Unit = {
+ def overwriteOutputAndExtendHiddenOutput(
Review Comment:
Please update scala doc here (instead of `updates nullabilities` it should
be `updates properties` and a bit more than that).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -403,6 +414,22 @@ class NameScope(
*
* {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}}
*
+ * In case we are resolving names in expression trees from HAVING or ORDER
BY on top of
+ * [[Aggregate]], we are able to resolve hidden attributes only if theose
are present in
Review Comment:
typo
```suggestion
* In case we are resolving names in expression trees from HAVING or ORDER
BY on top of
* [[Aggregate]], we are able to resolve hidden attributes only if those
are present in
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -848,12 +889,14 @@ class NameScopeStack extends SQLConfHelper {
multipartName: Seq[String],
canLaterallyReferenceColumn: Boolean = true,
canReferenceAggregateExpressionAliases: Boolean = false,
- canResolveNameByHiddenOutput: Boolean = false): NameTarget = {
+ canResolveNameByHiddenOutput: Boolean = false,
+ canReferenceAggregatedAccessOnlyAttributes: Boolean = false): NameTarget
= {
Review Comment:
Could you add a small example here that demonstrates aggregated access only
behavior?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -403,6 +414,22 @@ class NameScope(
*
* {{ SELECT col1 + col2 AS a FROM VALUES (1, 2) GROUP BY a; }}}
*
+ * In case we are resolving names in expression trees from HAVING or ORDER
BY on top of
+ * [[Aggregate]], we are able to resolve hidden attributes only if theose
are present in
+ * grouping expressions, or if the reference itself is under an
[[AggregateExpression]].
Review Comment:
Please add a query example for the case where sort attribute is present in
grouping expressions e.g
```
--this passes
SELECT COUNT(col1) FROM t1 GROUP BY col1, col2 ORDER BY col2;
```
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/NameScope.scala:
##########
@@ -918,20 +962,45 @@ class NameScopeStack extends SQLConfHelper {
}
/**
- * When the scope gets the new output, we need to refresh nullabilities in
its `hiddenOutput`. If
- * an attribute is nullable in either old hidden output or new output, it
must remain nullable in
- * new hidden output as well.
+ * Update attribute properties when overwriting the current outputs.
+ *
+ * 1. When the scope gets the new output, we need to refresh nullabilities
in its `hiddenOutput`.
+ * If an attribute is nullable in either old hidden output or new output, it
must remain nullable
+ * in new hidden output as well.
+ *
+ * 2. If we are updating the hidden output on top of an [[Aggregate]],
HAVING and ORDER BY clauses
Review Comment:
Same as above
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/resolver/FunctionResolver.scala:
##########
@@ -77,9 +77,14 @@ class FunctionResolver(
private val typeCoercionResolver: TypeCoercionResolver =
new TypeCoercionResolver(timezoneAwareExpressionResolver)
+ private val expressionResolutionContextStack =
+ expressionResolver.getExpressionResolutionContextStack
/**
* Main method used to resolve an [[UnresolvedFunction]]. It resolves it in
the following steps:
+ * - Check if the `unresolvedFunction` is an aggregate expression. Set
Review Comment:
```suggestion
* - Check if the `unresolvedFunction` is an [[AggregateExpression]]. Set
```
Otherwise it's easy to confuse `AggregateExpression` with one of the
`Aggregate.aggregateExpressions`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]