mihailoale-db opened a new pull request, #50136: URL: https://github.com/apache/spark/pull/50136
### What changes were proposed in this pull request? In this PR I propose that we use `toPrettySQL` instead of `toString` when building `Alias`es in `ResolveAggregateFunctions`. ### Why are the changes needed? Right now you can write a DataFrame program in which you can reference a column implicitly aliased with a expression id in its name. If we switch from using `toString` to `toPrettySQL` we won't have expression ids `Alias` name and thus users won't be able to utilize this. For example: ``` import org.apache.spark.sql.functions._ val df = spark.sql("SELECT col1 FROM VALUES (1, 2) GROUP BY col1 ORDER BY MAX(col2)") df.queryExecution.analyzed df.where(col("max(col2#10)") === 0).queryExecution.analyzed ``` program above can work (if `df.queryExecution.analyzed` shows that the name of the `AggregateExpression` alias is `max(col2#10)`). But when run again it might fail because expression ids can be generated differently so we want to disable that. This is needed to enforce determinism in DataFrame programs. ### Does this PR introduce _any_ user-facing change? Some DataFrame programs are going to fail (but they would fail with every DataFrame reset, as explained.) ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org