[ 
https://issues.apache.org/jira/browse/SPARK-57353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087754#comment-18087754
 ] 

Anupam Yadav commented on SPARK-57353:
--------------------------------------

I'm working on a fix for this.

> [Analyzer++] GROUPING SETS/CUBE/ROLLUP with HAVING or ORDER BY crashes with 
> SparkUnsupportedOperationException
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57353
>                 URL: https://issues.apache.org/jira/browse/SPARK-57353
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Anupam Yadav
>            Priority: Major
>
> With `spark.sql.analyzer.singlePassResolver.enabled=true`, queries using 
> GROUP BY CUBE/ROLLUP/GROUPING SETS with HAVING or ORDER BY containing 
> aggregate functions crash with:
> {noformat}
> org.apache.spark.SparkUnsupportedOperationException: 
> [UNSUPPORTED_CALL.WITHOUT_SUGGESTION]
> Cannot call the method "dataType$" of the class 
> "org.apache.spark.sql.catalyst.expressions.BaseGroupingSets".
> SQLSTATE: 0A000
> {noformat}
> The single-pass resolver path invokes `assertValidAggregation` which calls 
> `checkValidGroupingExprs` on sort/filter expressions. This function accesses 
> `.dataType` on `BaseGroupingSets` expressions (Cube/Rollup/GroupingSets), but 
> these expressions throw from their `dataType` method because they are meant 
> to be expanded before type resolution.
> The legacy analyzer (default) handles all these correctly.
> *Repro:*
> {code:sql}
> -- All three variants crash with singlePassResolver enabled:
> -- Variant 1: CUBE + ORDER BY
> SELECT a, b, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
> GROUP BY CUBE(a, b) ORDER BY SUM(b);
> -- Variant 2: ROLLUP + HAVING
> SELECT a, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
> GROUP BY ROLLUP(a, b) HAVING SUM(b) > 25;
> -- Variant 3: GROUPING SETS + ORDER BY
> SELECT a, SUM(b) FROM VALUES (1,10),(1,20),(2,30) AS t(a,b)
> GROUP BY GROUPING SETS ((a, b), (a), ()) ORDER BY SUM(b);
> {code}
> *Root cause:* `ExprUtils.checkValidGroupingExprs` (ExprUtils.scala:211) calls 
> `.dataType` on `BaseGroupingSets` expressions before they have been expanded 
> in the single-pass resolver path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to