Re: [PR] [SPARK-53239][SQL] Improve `MapSort` and `SortArray` performance via `parallelSort` [spark]

via GitHub Sun, 10 Aug 2025 19:30:38 -0700


dongjoon-hyun commented on code in PR #51965:
URL: https://github.com/apache/spark/pull/51965#discussion_r2265566733



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1149,7 +1149,7 @@ case class SortArray(base: Expression, ascendingOrder: 
Expression)
   private def sortEval(array: Any, ascending: Boolean): Any = {
     val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
     if (elementType != NullType) {
-      java.util.Arrays.sort(data, if (ascending) lt else gt)
+      java.util.Arrays.parallelSort(data, if (ascending) lt else gt)

Review Comment:
   Thank you. I had the same questions. :)
   
   So, as I described in the PR description, I tested small data too (item 
size: 10).
   
   For the small data, the difference is negligible because Spark SQL has more 
overhead like SQL parsing and planning and collecting, @zhengruifeng .
   
   Initially, I developed `parallel_sort_array`, but decided to merge it 
because there is no regression so far.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53239][SQL] Improve `MapSort` and `SortArray` performance via `parallelSort` [spark]

Reply via email to