joan38 commented on code in PR #50474:
URL: https://github.com/apache/spark/pull/50474#discussion_r2033537124


##########
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##########
@@ -1014,39 +1014,44 @@ final class ShuffleBlockFetcherIterator(
           // a SuccessFetchResult or a FailureFetchResult.
           result = null
 
-          case PushMergedLocalMetaFetchResult(
-            shuffleId, shuffleMergeId, reduceId, bitmaps, localDirs) =>
-            // Fetch push-merged-local shuffle block data as multiple shuffle 
chunks
-            val shuffleBlockId = ShuffleMergedBlockId(shuffleId, 
shuffleMergeId, reduceId)
-            try {
-              val bufs: Seq[ManagedBuffer] = 
blockManager.getLocalMergedBlockData(shuffleBlockId,
-                localDirs)
-              // Since the request for local block meta completed 
successfully, numBlocksToFetch
-              // is decremented.
-              numBlocksToFetch -= 1
-              // Update total number of blocks to fetch, reflecting the 
multiple local shuffle
-              // chunks.
-              numBlocksToFetch += bufs.size
-              bufs.zipWithIndex.foreach { case (buf, chunkId) =>
-                buf.retain()
-                val shuffleChunkId = ShuffleBlockChunkId(shuffleId, 
shuffleMergeId, reduceId,
-                  chunkId)
-                pushBasedFetchHelper.addChunk(shuffleChunkId, bitmaps(chunkId))
-                results.put(SuccessFetchResult(shuffleChunkId, 
SHUFFLE_PUSH_MAP_ID,
-                  pushBasedFetchHelper.localShuffleMergerBlockMgrId, 
buf.size(), buf,
-                  isNetworkReqDone = false))
-              }
-            } catch {
-              case e: Exception =>
-                // If we see an exception with reading push-merged-local index 
file, we fallback
-                // to fetch the original blocks. We do not report block fetch 
failure
-                // and will continue with the remaining local block read.
-                logWarning("Error occurred while reading push-merged-local 
index, " +
-                  "prepare to fetch the original blocks", e)
-                pushBasedFetchHelper.initiateFallbackFetchForPushMergedBlock(
-                  shuffleBlockId, 
pushBasedFetchHelper.localShuffleMergerBlockMgrId)
+        case PushMergedLocalMetaFetchResult(

Review Comment:
   ## Benefits of this PR
   
   This PR makes sure this code base compiled with Scala 2.13 aligns with Scala 
3. This allows for an easier migration to Scala 3.
   
   It introduces very minor binary incompatibility for APIs that were supposed 
to be private anyway. The impact of those incompatibilities should be zero. 
Unless people used those private constructors illegally.
   
   If we exclude the future Scala 3 benefit and strictly focus on the immediate 
benefits of this PR: it removes wrongfully provided access to private 
constructors via the `apply` functions in their companion object.
   
   ## Why Scala 3?
   
   * Scala 3 was [released in May 
2021](https://github.com/scala/scala3/releases/tag/3.0.0) and pretty much the 
entire community supports it now. This puts the Spark project in an 
embarrassing position, as it regularly shows up in jokes pointing how Spark is 
late on upgrading Scala. We can see, we already got reactions on this PR from 
people hoping to have Scala 3 support one day in Spark.
   * Spark showing up in a lot of Scala codebases is causing a lot of pain for 
users who are also trying to use libraries that can only work on Scala 3.
   * Scala 2 has become a complex language, but Scala 3 is more 
straightforward. It has been redesigned to be significantly simpler, with the 
removal of complex features considered warts. Some of which have been back 
ported to Scala 2.13 as we can see in this PR.
   * Most learning materials now focus on Scala 3, as Scala 2 is largely being 
phased out. This leave Spark users in a bad position to start with Spark, as 
they need to rely on old learning materials outside of the Spark documentation.
   * [Support for Scala 2 is in maintenance 
mode](https://www.scala-lang.org/blog/2024/12/16/scala-2-maintenance.html). It 
does not feel right for an advanced piece of tech such as Spark to rely on a 
language that is now being phased out.
   * Scala 3 ends the binary incompatibility madness. Scala 2.x versions were 
breaking on every minor versions. As a result, we relied on cross compiling the 
codebase between those versions. This is painful. Moving to Scala 3 means 1 
last break/cross compile and no more breakage ever.
   * A potential benefit to Spark could be the more Pythony syntax of Scala 3. 
Since Spark has a large Python user base, [the optional indentation based 
syntax](https://docs.scala-lang.org/scala3/reference/other-new-features/indentation.html)
 might help Python users feel like home with Scala.
   * Scala 3 adds a lot of features that catch up with other languages such as 
[extension 
methods](https://docs.scala-lang.org/scala3/reference/contextual/extension-methods.html)
 or [enums](https://docs.scala-lang.org/scala3/reference/enums/enums.html) and 
more...
   
   ## Conclusion
   
   I don't think we should delay or downplay Scala 3 support. This PR is only a 
preparation for the codebase and the release of Spark 4 is an occasion to add 
those really minor binary breaks that should impact no one anyway, since these 
constructors where supposed to be private.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to