Re: [PR] fix: Making shuffle files generated in native shuffle mode reclaimable [datafusion-comet]

via GitHub Thu, 27 Mar 2025 08:04:17 -0700


mbutrovich commented on code in PR #1568:
URL: https://github.com/apache/datafusion-comet/pull/1568#discussion_r2016848830



##########
spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala:
##########
@@ -232,14 +222,17 @@ object CometShuffleExchangeExec extends 
ShimCometShuffleExchangeExec {
         (0, _)
       ), // adding fake partitionId that is always 0 because ShuffleDependency 
requires it
       serializer = serializer,
-      shuffleWriterProcessor =
-        new CometShuffleWriteProcessor(outputPartitioning, outputAttributes, 
metrics, numParts),
+      shuffleWriterProcessor = 
ShuffleExchangeExec.createShuffleWriteProcessor(metrics),
       shuffleType = CometNativeShuffle,
       partitioner = new Partitioner {
         override def numPartitions: Int = outputPartitioning.numPartitions
         override def getPartition(key: Any): Int = key.asInstanceOf[Int]
       },
-      decodeTime = metrics("decode_time"))
+      decodeTime = metrics("decode_time"),
+      outputPartitioning = Some(outputPartitioning),
+      outputAttributes = outputAttributes,
+      shuffleWriteMetrics = metrics,
+      numParts = numParts)
     dependency
   }
 

Review Comment:
   line 241 below: while you're in this file could you change `Boson` -> 
`Comet` please?



##########
spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala:
##########
@@ -201,6 +204,17 @@ class CometNativeShuffleSuite extends CometTestBase with 
AdaptiveSparkPlanHelper
     }
   }
 
+  test("fix: Comet native shuffle deletes shuffle files after query") {
+    withParquetTable((0 until 5).map(i => (i, i + 1)), "tbl") {
+      sql("SELECT count(_2), sum(_2) FROM tbl GROUP BY _1").collect()
+      val diskBlockManager = SparkEnv.get.blockManager.diskBlockManager
+      eventually(timeout(30.seconds), interval(1.seconds)) {

Review Comment:
   Can we assert the before state that the files list is non-empty here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Making shuffle files generated in native shuffle mode reclaimable [datafusion-comet]

Reply via email to