yaooqinn opened a new pull request, #54663: URL: https://github.com/apache/spark/pull/54663
### What changes were proposed in this pull request? Same fix as #54072 (SPARK-55289) for `in-set-operations.sql`. Adds `--SET spark.sql.autoBroadcastJoinThreshold=-1` to `in-order-by.sql` to prevent OOM from BroadcastHashJoin accumulating hash tables on memory-constrained CI runners. ### Why are the changes needed? `in-order-by.sql` intermittently fails on CI with `SparkOutOfMemoryError` for the same root cause as `in-set-operations.sql` — complex correlated IN-subqueries with multiple BroadcastHashJoin operations exceeding JVM heap under memory pressure. ### Does this PR introduce _any_ user-facing change? No. Test-only change. ### How was this patch tested? Golden files regenerated. Minor row reordering for rows with identical sort keys (expected when switching from BroadcastHashJoin to SortMergeJoin). ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with GitHub Copilot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
