Stephan Ewen created FLINK-3300:
-----------------------------------

             Summary: Concurrency Bug in Yarn JobManager
                 Key: FLINK-3300
                 URL: https://issues.apache.org/jira/browse/FLINK-3300
             Project: Flink
          Issue Type: Bug
          Components: JobManager
    Affects Versions: 1.0.0
            Reporter: Stephan Ewen
            Priority: Blocker
             Fix For: 1.0.0


The change to use the async ResourceManager client introduced concurrency 
problems: The ResourceManager callback threads run and change data structures 
at the same time as the actor methods, voiding the actor concurrency model.

One example that can happen is that the callback tries to start containers 
while the ContainerLaunchContext is still not set (because the actor method is 
still in the StartYarnSession method).

Bug introducing commit: 
https://github.com/apache/flink/commit/4e52fe4304566e5239996b3d48290e0c1f0772e8

Quick fix could be to revert the commit. Better solution would be to let the 
callback methods send actor messages to the YobManager, rather than directly 
acting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to