dongjoon-hyun commented on code in PR #51965:
URL: https://github.com/apache/spark/pull/51965#discussion_r2265566733
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##########
@@ -1149,7 +1149,7 @@ case class SortArray(base: Expression, ascendingOrder:
Expression)
private def sortEval(array: Any, ascending: Boolean): Any = {
val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
if (elementType != NullType) {
- java.util.Arrays.sort(data, if (ascending) lt else gt)
+ java.util.Arrays.parallelSort(data, if (ascending) lt else gt)
Review Comment:
Thank you. I had the same questions. :)
So, as I described in the PR description, I tested small data too (item
size: 10).
For the small data, the difference is negligible because Spark SQL has more
overhead like SQL parsing and planning and collecting, @zhengruifeng .
Initially, I developed `parallel_sort_array`, but decided to merge it
because there is no regression so far.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]