joan38 commented on code in PR #50474: URL: https://github.com/apache/spark/pull/50474#discussion_r2033537124
########## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ########## @@ -1014,39 +1014,44 @@ final class ShuffleBlockFetcherIterator( // a SuccessFetchResult or a FailureFetchResult. result = null - case PushMergedLocalMetaFetchResult( - shuffleId, shuffleMergeId, reduceId, bitmaps, localDirs) => - // Fetch push-merged-local shuffle block data as multiple shuffle chunks - val shuffleBlockId = ShuffleMergedBlockId(shuffleId, shuffleMergeId, reduceId) - try { - val bufs: Seq[ManagedBuffer] = blockManager.getLocalMergedBlockData(shuffleBlockId, - localDirs) - // Since the request for local block meta completed successfully, numBlocksToFetch - // is decremented. - numBlocksToFetch -= 1 - // Update total number of blocks to fetch, reflecting the multiple local shuffle - // chunks. - numBlocksToFetch += bufs.size - bufs.zipWithIndex.foreach { case (buf, chunkId) => - buf.retain() - val shuffleChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, - chunkId) - pushBasedFetchHelper.addChunk(shuffleChunkId, bitmaps(chunkId)) - results.put(SuccessFetchResult(shuffleChunkId, SHUFFLE_PUSH_MAP_ID, - pushBasedFetchHelper.localShuffleMergerBlockMgrId, buf.size(), buf, - isNetworkReqDone = false)) - } - } catch { - case e: Exception => - // If we see an exception with reading push-merged-local index file, we fallback - // to fetch the original blocks. We do not report block fetch failure - // and will continue with the remaining local block read. - logWarning("Error occurred while reading push-merged-local index, " + - "prepare to fetch the original blocks", e) - pushBasedFetchHelper.initiateFallbackFetchForPushMergedBlock( - shuffleBlockId, pushBasedFetchHelper.localShuffleMergerBlockMgrId) + case PushMergedLocalMetaFetchResult( Review Comment: ## Benefits of this PR This PR makes sure this code base compiled with Scala 2.13 aligns with Scala 3. This allows for an easier migration to Scala 3. It introduces very minor binary incompatibility for APIs that were supposed to be private anyway. The impact of those incompatibilities should be zero. Unless people used those private constructors illegally. If we exclude the future Scala 3 benefit and strictly focus on the immediate benefits of this PR: it removes wrongfully provided access to private constructors via the `apply` functions in their companion object. ## Why Scala 3? * Scala 3 was [released in May 2021](https://github.com/scala/scala3/releases/tag/3.0.0) and pretty much the entire community supports it now. This puts the Spark project in an embarrassing position, as it regularly shows up in jokes pointing how Spark is late on upgrading Scala. We can see, we already got reactions on this PR from people hoping to have Scala 3 support one day in Spark. * Spark showing up in a lot of Scala codebases is causing a lot of pain for users who are also trying to use libraries that can only work on Scala 3. * Scala 2 has become a complex language, but Scala 3 is more straightforward. It has been redesigned to be significantly simpler, with the removal of complex features considered warts. Some of which have been back ported to Scala 2.13 as we can see in this PR. * Most learning materials now focus on Scala 3, as Scala 2 is largely being phased out. This leaves Spark users in a bad position to start with Spark, as they need to rely on old learning materials outside of the Spark documentation. * [Support for Scala 2 is in maintenance mode](https://www.scala-lang.org/blog/2024/12/16/scala-2-maintenance.html). It does not feel right for an advanced piece of tech such as Spark to rely on a language version that is now being phased out. * Scala 3 ends the binary incompatibility madness. Scala 2.x versions were breaking on every minor versions. As a result, we relied on cross compiling the codebase between those versions. This is painful. Moving to Scala 3 means 1 last break/cross compile and no more breakage ever. * A potential benefit to Spark could be the more Pythony syntax of Scala 3. Since Spark has a large Python user base, [the optional indentation based syntax](https://docs.scala-lang.org/scala3/reference/other-new-features/indentation.html) might help Python users feel like home with Scala. * Scala 3 adds a lot of features that catch up with other languages such as [extension methods](https://docs.scala-lang.org/scala3/reference/contextual/extension-methods.html) or [enums](https://docs.scala-lang.org/scala3/reference/enums/enums.html) and more... * Scala 2 is slow to compile. In our repo Scala 3 is 3 times faster to compile than Scala 2, for the same code! ## Conclusion I don't think we should delay or downplay Scala 3 support. This PR is only a preparation for the codebase and the release of Spark 4 is an occasion to add those really minor binary breaks that should impact no one anyway, since these constructors where supposed to be private. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org