joan38 commented on code in PR #50474:
URL: https://github.com/apache/spark/pull/50474#discussion_r2033537124
##########
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##########
@@ -1014,39 +1014,44 @@ final class ShuffleBlockFetcherIterator(
// a SuccessFetchResult or a FailureFetchResult.
result = null
- case PushMergedLocalMetaFetchResult(
- shuffleId, shuffleMergeId, reduceId, bitmaps, localDirs) =>
- // Fetch push-merged-local shuffle block data as multiple shuffle
chunks
- val shuffleBlockId = ShuffleMergedBlockId(shuffleId,
shuffleMergeId, reduceId)
- try {
- val bufs: Seq[ManagedBuffer] =
blockManager.getLocalMergedBlockData(shuffleBlockId,
- localDirs)
- // Since the request for local block meta completed
successfully, numBlocksToFetch
- // is decremented.
- numBlocksToFetch -= 1
- // Update total number of blocks to fetch, reflecting the
multiple local shuffle
- // chunks.
- numBlocksToFetch += bufs.size
- bufs.zipWithIndex.foreach { case (buf, chunkId) =>
- buf.retain()
- val shuffleChunkId = ShuffleBlockChunkId(shuffleId,
shuffleMergeId, reduceId,
- chunkId)
- pushBasedFetchHelper.addChunk(shuffleChunkId, bitmaps(chunkId))
- results.put(SuccessFetchResult(shuffleChunkId,
SHUFFLE_PUSH_MAP_ID,
- pushBasedFetchHelper.localShuffleMergerBlockMgrId,
buf.size(), buf,
- isNetworkReqDone = false))
- }
- } catch {
- case e: Exception =>
- // If we see an exception with reading push-merged-local index
file, we fallback
- // to fetch the original blocks. We do not report block fetch
failure
- // and will continue with the remaining local block read.
- logWarning("Error occurred while reading push-merged-local
index, " +
- "prepare to fetch the original blocks", e)
- pushBasedFetchHelper.initiateFallbackFetchForPushMergedBlock(
- shuffleBlockId,
pushBasedFetchHelper.localShuffleMergerBlockMgrId)
+ case PushMergedLocalMetaFetchResult(
Review Comment:
## Benefits of this PR
This PR makes sure this code base compiled with Scala 2.13 aligns with Scala
3. This allows for an easier migration to Scala 3.
It introduces very minor binary incompatibility for APIs that were supposed
to be private anyway. The impact of those incompatibilities should be zero.
Unless people used those private constructors illegally.
If we exclude the future Scala 3 benefit and strictly focus on the immediate
benefits of this PR: it removes wrongfully provided access to private
constructors via the `apply` functions in their companion object.
## Why Scala 3?
* Scala 3 was [released in May
2021](https://github.com/scala/scala3/releases/tag/3.0.0) and pretty much the
entire community supports it now. This puts the Spark project in an
embarrassing position, as it regularly shows up in jokes pointing how Spark is
late on upgrading Scala. We can see, we already got reactions on this PR from
people hoping to have Scala 3 support one day in Spark.
* Spark showing up in a lot of Scala codebases is causing a lot of pain for
users who are also trying to use libraries that can only work on Scala 3.
* Scala 2 has become a complex language, but Scala 3 is more
straightforward. It has been redesigned to be significantly simpler, with the
removal of complex features considered warts. Some of which have been back
ported to Scala 2.13 as we can see in this PR.
* Most learning materials now focus on Scala 3, as Scala 2 is largely being
phased out. This leave Spark users in a bad position to start with Spark, as
they need to rely on old learning materials outside of the Spark documentation.
* [Support for Scala 2 is in maintenance
mode](https://www.scala-lang.org/blog/2024/12/16/scala-2-maintenance.html). It
does not feel right for an advanced piece of tech such as Spark to rely on a
language that is now being phased out.
* Scala 3 ends the binary incompatibility madness. Scala 2.x versions were
breaking on every minor versions. As a result, we relied on cross compiling the
codebase between those versions. This is painful. Moving to Scala 3 means 1
last break/cross compile and no more breakage ever.
* A potential benefit to Spark could be the more Pythony syntax of Scala 3.
Since Spark has a large Python user base, [the optional indentation based
syntax](https://docs.scala-lang.org/scala3/reference/other-new-features/indentation.html)
might help Python users feel like home with Scala.
* Scala 3 adds a lot of features that catch up with other languages such as
[extension
methods](https://docs.scala-lang.org/scala3/reference/contextual/extension-methods.html)
or [enums](https://docs.scala-lang.org/scala3/reference/enums/enums.html) and
more...
* Scala 3's compiler is faster than Scala 2's
## Conclusion
I don't think we should delay or downplay Scala 3 support. This PR is only a
preparation for the codebase and the release of Spark 4 is an occasion to add
those really minor binary breaks that should impact no one anyway, since these
constructors where supposed to be private.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]