[ https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812074#comment-15812074 ]
ASF GitHub Bot commented on FLINK-5129: --------------------------------------- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/3084 [FLINK-5129] make the BlobServer use a distributed file system Make the BlobCache use the BlobServer's distributed file system in HA mode: previously even in HA mode and if the cache has access to the file system, it would download BLOBs from one central BlobServer. By using the distributed file system beneath we may leverage its scalability and remove a single point of (performance) failure. If the distributed file system is not accessible at the blob caches, the old behaviour is used. @uce can you have a look? (this is an updated and fixed version of https://github.com/apache/flink/pull/3076) You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink FLINK-5129a Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3084.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3084 ---- commit 464f2c834688507c67acb3ad584827132ebe444e Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-22T11:49:03Z [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath This was actually the same implementation as FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two could have been removed but the implementation makes most sense at the concrete file system abstraction layer, i.e. in FileSystemBlobStore. commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-23T15:11:35Z [hotfix] do not create intermediate strings inside String.format in BlobUtils commit 36ab6121e336f63138e442ea48a751ede7fb04c3 Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-24T16:11:19Z [hotfix] properly shut down the BlobServer in BlobServerRangeTest commit c8c12c67ae875ca5c96db78375bef880cf2a3c59 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-05T17:06:01Z [hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too This makes cleaning up simpler. commit a078cb0c26071fe70e3668d23d0c8bef8550892f Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-05T17:27:00Z [hotfix] add a missing "'" to the BlobStore class commit a643f0b989c640a81b112ad14ae27a2a2b1ab257 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-05T17:07:13Z [FLINK-5129] BlobServer: include the cluster id in the HA storage path for blobs This applies to the ZookeeperHaServices implementation. commit 7d832919040059961940fc96d0cdb285bc9f77d3 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-05T17:18:10Z [FLINK-5129] unify duplicate code between the BlobServer and ZookeeperHaServices (this was introduced by c64860677f) commit 19879a01b99c4772a09627eb5f380f794f6c1e27 Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-30T13:52:12Z [hotfix] add some more documentation in BlobStore-related classes commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T10:55:23Z [hotfix] minor code beautifications when checking parameters + also check the blobService parameter in BlobLibraryCacheManager commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T13:21:42Z [FLINK-5129] let BlobUtils#initStorageDirectory() throw a proper IOException commit c8e2815787338f52e5ad369bcaedb1798284dd29 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T13:59:51Z [hotfix] simplify code in BlobCache#deleteGlobal() Also, re-order the code so that a local delete is always tried before creating a connection to the BlobServer. If that fails, the local file is deleted at least. commit 5cd1c20aa604a9556c069ab78d8e471fa058499e Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-29T17:11:06Z [hotfix] re-use some code in BlobServerDeleteTest commit d39948a6baa0cd6f68c4dfd8daffdd65e573fbca Author: Nico Kruber <n...@data-artisans.com> Date: 2016-11-30T13:35:38Z [hotfix] improve some failure messages in the BlobService's HA unit tests commit dc87ae36088cc48a4122351ebe5b09a31d7fba41 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T14:06:30Z [FLINK-5129] make the BlobCache also use a distributed file system in HA mode If available (in HA mode), download the jar files from the distributed file system directly instead of querying the BlobServer. This way the load is more distributed among the nodes of the file system (depending on its implementation of course) compared to putting all the burden on a single BlobServer. commit 389eaa9779d4bf22cc3972208d4f35ac7a966f5c Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T16:21:05Z [FLINK-5129] add unit tests for the BlobCache accessing the distributed FS directly commit b3bcf944df87f37cccd831e8fb56b95caa620dad Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-09T13:41:59Z [FLINK-5129] let FileSystemBlobStore#get() remove the target file on failure If the copy fails, an IOException was thrown but the target file remained and was (most likely) not finished. This cleans up the file in that case so that code above, e.g. BlobServer and BlobCache, can rely on a file being complete as long as it exists. ---- > make the BlobServer use a distributed file system > ------------------------------------------------- > > Key: FLINK-5129 > URL: https://issues.apache.org/jira/browse/FLINK-5129 > Project: Flink > Issue Type: Improvement > Components: Network > Reporter: Nico Kruber > Assignee: Nico Kruber > > Currently, the BlobServer uses a local storage and, in addition when the HA > mode is set, a distributed file system, e.g. hdfs. This, however, is only > used by the JobManager and all TaskManager instances request blobs from the > JobManager. By using the distributed file system there as well, we would > lower the load on the JobManager and increase scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)