stefankandic opened a new pull request, #50267: URL: https://github.com/apache/spark/pull/50267
### What changes were proposed in this pull request? Disabling `ObjectHashAggregate` when grouping on columns with collations. ### Why are the changes needed? https://github.com/apache/spark/pull/45290 added support for sort based aggregation on collated columns and explicitly forbade the use of hash aggregate for collated columns. However, it did not consider the third type of aggregate, the object hash aggregate, which is only used when there are also TypedImperativeAggregate expressions present ([source](https://github.com/apache/spark/blob/f3b081066393e1568c364b6d3bc0bceabd1e7e9f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1204)). That means that if we group by a collated column and also have a TypedImperativeAggregate we will end up using the object has aggregate which can lead to incorrect results like in the example below: ```code CREATE TABLE tbl(c1 STRING COLLATE UTF8_LCASE, c2 INT) USING PARQUET; INSERT INTO tbl VALUES ('HELLO', 1), ('hello', 2), ('HeLlO', 3); SELECT COLLECT_LIST(c2) as list FROM tbl GROUP BY c1; ``` where the result would have three rows with values [1], [2] and [3] instead of one row with value [1, 2, 3]. For this reason we should do the same thing as we did for the regular hash aggregate, make it so that it doesn't support grouping expressions on collated columns. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this patch, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org