erenavsarogullari commented on code in PR #54242:
URL: https://github.com/apache/spark/pull/54242#discussion_r2835588974


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala:
##########
@@ -109,7 +109,8 @@ case class CoalesceShufflePartitions(session: SparkSession) 
extends AQEShuffleRe
         coalesceGroup.shuffleStages.map(_.partitionSpecs),
         advisoryTargetSize = advisoryTargetSize,
         minNumPartitions = minNumPartitions,
-        minPartitionSize = minPartitionSize)
+        minPartitionSize = minPartitionSize,
+        coalesceGroup.shuffleStages.map(_.shuffleStage.id))

Review Comment:
   Is the eventual goal as follow?
   Legacy API Signature:
   ```
   def coalescePartitions(
         mapOutputStatistics: Seq[Option[MapOutputStatistics]],
         inputPartitionSpecs: Seq[Option[Seq[ShufflePartitionSpec]]],
         advisoryTargetSize: Long,
         minNumPartitions: Int,
         minPartitionSize: Long,
         shuffleStageIds: Seq[Int] = Seq.empty)
   ```
   New API Signature:
   ```
   def coalescePartitions(
         shuffleStages: Seq[ShuffleStageInfo],
         advisoryTargetSize: Long,
         minNumPartitions: Int,
         minPartitionSize: Long)
   ```
   If so, i think it makes sense because there are 3 parameters 
(mapOutputStatistics, inputPartitionSpecs and shuffleStageIds) need to be 
passed to ShufflePartitionsUtil.coalescePartitions() which are using the same 
reference `coalesceGroup.shuffleStages`. However, there is one down-side:
   Currently, `ShufflePartitionsUtil.coalescePartitions()` func is also used by 
`ShufflePartitionsUtilSuite` and they will also need to be refactored to pass 
`Seq[ShuffleStageInfo]` instead of `Seq[Option[MapOutputStatistics]]`. New way 
seems to introduce more verbosity when being compared with legacy signature 
change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to