LuciferYang opened a new pull request, #54484: URL: https://github.com/apache/spark/pull/54484
### What changes were proposed in this pull request? This pr updates a test case in `DataFrameAggregateSuite` regarding `max_by` and `min_by` functions. Specifically, it refines the assertion logic for invalid `k` input (non-numeric string) to account for different behaviors depending on `spark.sql.ansi.enabled`. - **ANSI Enabled**: Expects `CAST_INVALID_INPUT` or "cannot be cast" error, as the string `'two'` cannot be cast to an integer. - **ANSI Disabled**: Expects `VALUE_OUT_OF_RANGE` error. In legacy mode, the invalid cast returns `0` (default for integer), which then triggers a validation error because `k` must be positive. ### Why are the changes needed? Resume daily testing in non-ANSI mode - https://github.com/apache/spark/actions/runs/22247813526/job/64365502163 ``` [info] - max_by and min_by with k *** FAILED *** (1 second, 431 milliseconds) [info] "[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, y, two)" due to data type mismatch: The `k` must be between [1, 100000] (current value = 0). SQLSTATE: 42K09; line 1 pos 7; [info] 'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as int), false, 0, 0))] [info] +- SubqueryAlias tab [info] +- LocalRelation [x#628078, y#628079] [info] " did not contain "CAST_INVALID_INPUT", and "[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, y, two)" due to data type mismatch: The `k` must be between [1, 100000] (current value = 0). SQLSTATE: 42K09; line 1 pos 7; [info] 'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as int), false, 0, 0))] [info] +- SubqueryAlias tab [info] +- LocalRelation [x#628078, y#628079] [info] " did not contain "cannot be cast" (DataFrameAggregateSuite.scala:1386) ... [info] *** 4 TESTS FAILED *** [error] Failed: Total 4096, Failed 4, Errors 0, Passed 4092, Ignored 13 [error] Failed tests: [error] org.apache.spark.sql.SingleLevelAggregateHashMapSuite [error] org.apache.spark.sql.DataFrameAggregateSuite [error] org.apache.spark.sql.TwoLevelAggregateHashMapSuite [error] org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually verify by running the command `SPARK_ANSI_SQL_MODE=false build/sbt "sql/testOnly org.apache.spark.sql.SingleLevelAggregateHashMapSuite org.apache.spark.sql.DataFrameAggregateSuite org.apache.spark.sql.TwoLevelAggregateHashMapSuite org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite"`, and all tests pass successfully. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
