GitHub user aarondav opened a pull request:
https://github.com/apache/spark/pull/3101
[SPARK-4238] [Core] Perform network-level retry of shuffle file fetches
This adds a RetryingBlockFetcher to the NettyBlockTransferService which is
wrapped around our typical OneForOneBlockFetcher, adding retry logic in the
event of an IOException.
This sort of retry allows us to avoid marking an entire executor as failed
due to garbage collection or high network load.
TODO:
- [ ] unit tests
- [ ] put in ExternalShuffleClient too
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aarondav/spark retry
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3101.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3101
----
commit c293a3f5c7249692d2652c4b5fa5e496ee702a9f
Author: Aaron Davidson <[email protected]>
Date: 2014-11-05T02:34:37Z
[SPARK-4238] [Core] Perform network-level retry of shuffle file fetches
This adds a RetryingBlockFetcher to the NettyBlockTransferService which is
wrapped around our typical OneForOneBlockFetcher, adding retry logic in the
event of an IOException.
This sort of retry allows us to avoid marking an entire executor as failed
due to garbage collection or high network load.
TODO:
- [ ] unit tests
- [ ] put in ExternalShuffleClient too
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]