[jira] [Commented] (FLINK-6020) Blob Server cannot handle multiple job submits (with same content) parallelly

ASF GitHub Bot (JIRA) Fri, 12 May 2017 09:50:20 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008382#comment-16008382
 ]


ASF GitHub Bot commented on FLINK-6020:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/3888

    [FLINK-6020] Introduce BlobServer#readWriteLock to synchronize file 
creation and deletion

    This PR is based on #3873 and #3864.
    
    This commit introduces a BlobServer#readWriteLock in order to synchronize 
file creation
    and deletion operations in BlobServerConnection and BlobServer. This will 
prevent
    that multiple put, delete and get operations interfere with each other.
    
    The get operations are synchronized using the read lock in order to 
guarantee some kind of
    parallelism.
    
    What this PR does not address is the handling of concurrent writes and 
reads to the `BlobStore`. This could be solved via SUCCESS files in order to 
indicate the completion of a file. However, the first read operation should now 
happen strictly after the write operation due to the locking.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink concurrentBlobUploads

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3888
    
----
commit 28bc0acd5c2d3858e41fd29113fff1c7a40471f5
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-05-11T15:36:17Z

    [FLINK-6555] [futures] Generalize ConjunctFuture to return results
    
    The ConjunctFuture now returns the set of future values once it is 
completed.

commit f221353584c9089552572387eee9e162695311cd
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-05-12T09:05:13Z

    Introduce WaitingConjunctFuture; Fix thread safety issue with 
ResultConjunctFuture
    
    The WaitingConjunctFuture waits for the completion of its futures. The 
future values
    are discarded making it more efficient than the ResultConjunctFuture which 
returns
    the futures' values. The WaitingConjunctFuture is instantiated via
    FutureUtils.waitForAll(Collection<Future>).

commit 79eafa2671f4a4dfdc9ab135443c339ef2e8001a
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-05-09T08:26:37Z

    [FLINK-6519] Integrate BlobStore in lifecycle management of 
HighAvailabilityServices
    
    The HighAvailabilityService creates a single BlobStoreService instance 
which is
    shared by all BlobServer and BlobCache instances. The BlobStoreService's 
lifecycle
    is exclusively managed by the HighAvailabilityServices. This means that the
    BlobStore's content is only cleaned up if the HighAvailabilityService's HA 
data
    is cleaned up. Having this single point of control, makes it easier to 
decide when
    to discard HA data (e.g. in case of a successful job execution) and when to 
retain
    the data (e.g. for recovery).
    
    Close and cleanup all data of BlobStore in HighAvailabilityServices
    
    Use HighAvailabilityServices to create BlobStore
    
    Introduce BlobStoreService interface to hide close and 
closeAndCleanupAllData methods

commit c6ff2ced58e60f63d0236e53c83192f64479c44a
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-05-10T15:38:49Z

    [FLINK-6020] Introduce BlobServer#readWriteLock to synchronize file 
creation and deletion
    
    This commit introduces a BlobServer#readWriteLock in order to synchronize 
file creation
    and deletion operations in BlobServerConnection and BlobServer. This will 
prevent
    that multiple put and get operations interfere with each other and with get 
operations.
    
    The get operations are synchronized using the read lock in order to 
guarantee some kind of
    parallelism.

----


> Blob Server cannot handle multiple job submits (with same content) parallelly
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-6020
>                 URL: https://issues.apache.org/jira/browse/FLINK-6020
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Distributed Coordination
>            Reporter: Tao Wang
>            Assignee: Till Rohrmann
>            Priority: Critical
>
> In yarn-cluster mode, if we submit one same job multiple times parallelly, 
> the task will encounter class load problem and lease occuputation.
> Because blob server stores user jars in name with generated sha1sum of those, 
> first writes a temp file and move it to finalialize. For recovery it also 
> will put them to HDFS with same file name.
> In same time, when multiple clients sumit same job with same jar, the local 
> jar files in blob server and those file on hdfs will be handled in multiple 
> threads(BlobServerConnection), and impact each other.
> It's better to have a way to handle this, now two ideas comes up to my head:
> 1. lock the write operation, or
> 2. use some unique identifier as file name instead of ( or added up to) 
> sha1sum of the file contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-6020) Blob Server cannot handle multiple job submits (with same content) parallelly

Reply via email to