[PR] [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure when ANSI mode is disabled [spark]

via GitHub Wed, 25 Feb 2026 07:34:11 -0800


LuciferYang opened a new pull request, #54484:
URL: https://github.com/apache/spark/pull/54484


   ### What changes were proposed in this pull request?
   This pr updates a test case in `DataFrameAggregateSuite` regarding `max_by` 
and `min_by` functions. Specifically, it refines the assertion logic for 
invalid `k` input (non-numeric string) to account for different behaviors 
depending on `spark.sql.ansi.enabled`.
   
   - **ANSI Enabled**: Expects `CAST_INVALID_INPUT` or "cannot be cast" error, 
as the string `'two'` cannot be cast to an integer.
   - **ANSI Disabled**: Expects `VALUE_OUT_OF_RANGE` error. In legacy mode, the 
invalid cast returns `0` (default for integer), which then triggers a 
validation error because `k` must be positive.
   
   
   ### Why are the changes needed?
   Resume daily testing in non-ANSI mode
   
   - https://github.com/apache/spark/actions/runs/22247813526/job/64365502163
   
   ```
   [info] - max_by and min_by with k *** FAILED *** (1 second, 431 milliseconds)
   [info]   "[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, 
y, two)" due to data type mismatch: The `k` must be between [1, 100000] 
(current value = 0). SQLSTATE: 42K09; line 1 pos 7;
   [info]   'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as 
int), false, 0, 0))]
   [info]   +- SubqueryAlias tab
   [info]      +- LocalRelation [x#628078, y#628079]
   [info]   " did not contain "CAST_INVALID_INPUT", and 
"[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, y, two)" due 
to data type mismatch: The `k` must be between [1, 100000] (current value = 0). 
SQLSTATE: 42K09; line 1 pos 7;
   [info]   'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as 
int), false, 0, 0))]
   [info]   +- SubqueryAlias tab
   [info]      +- LocalRelation [x#628078, y#628079]
   [info]   " did not contain "cannot be cast" 
(DataFrameAggregateSuite.scala:1386)
   ...
   [info] *** 4 TESTS FAILED ***
   [error] Failed: Total 4096, Failed 4, Errors 0, Passed 4092, Ignored 13
   [error] Failed tests:
   [error]      org.apache.spark.sql.SingleLevelAggregateHashMapSuite
   [error]      org.apache.spark.sql.DataFrameAggregateSuite
   [error]      org.apache.spark.sql.TwoLevelAggregateHashMapSuite
   [error]      
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite
   [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Manually verify by running the command `SPARK_ANSI_SQL_MODE=false build/sbt 
"sql/testOnly org.apache.spark.sql.SingleLevelAggregateHashMapSuite 
org.apache.spark.sql.DataFrameAggregateSuite 
org.apache.spark.sql.TwoLevelAggregateHashMapSuite 
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite"`, and all 
tests pass successfully.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure when ANSI mode is disabled [spark]

Reply via email to