mihailoale-db opened a new pull request, #50136:
URL: https://github.com/apache/spark/pull/50136

   ### What changes were proposed in this pull request?
   In this PR I propose that we use `toPrettySQL` instead of `toString` when 
building `Alias`es in `ResolveAggregateFunctions`.
   
   ### Why are the changes needed?
   Right now you can write a DataFrame program in which you can reference a 
column implicitly aliased with a expression id in its name. If we switch from 
using `toString` to `toPrettySQL` we won't have expression ids `Alias` name and 
thus users won't be able to utilize this.
   For example:
   ```
   import org.apache.spark.sql.functions._
   
   val df = spark.sql("SELECT col1 FROM VALUES (1, 2) GROUP BY col1 ORDER BY 
MAX(col2)")
   df.queryExecution.analyzed
   
   df.where(col("max(col2#10)") === 0).queryExecution.analyzed
   ```
   program above can work (if `df.queryExecution.analyzed` shows that the name 
of the `AggregateExpression` alias is `max(col2#10)`). But when run again it 
might fail because expression ids can be generated differently so we want to 
disable that.
   
   This is needed to enforce determinism in DataFrame programs.
   
   ### Does this PR introduce _any_ user-facing change?
   Some DataFrame programs are going to fail (but they would fail with every 
DataFrame reset, as explained.)
   
   ### How was this patch tested?
   Existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to