[ https://issues.apache.org/jira/browse/FLINK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561093#comment-16561093 ]
Gary Yao edited comment on FLINK-9936 at 7/29/18 12:13 PM: ----------------------------------------------------------- [~liurenjie1024] What's the state of this ticket? Do you have solution in mind? I think we need to add a callback to the {{Runnable}} scheduled in {{ResourceManager#grantLeadership}}. What do you think? was (Author: gjy): [~liurenjie1024] What's the state of this ticket? Do you have solution in mind? I think we need to add a callback to the runnable scheduled in {{ResourceManager#grantLeadership}}. What do you think? > Mesos resource manager unable to connect to master after failover > ----------------------------------------------------------------- > > Key: FLINK-9936 > URL: https://issues.apache.org/jira/browse/FLINK-9936 > Project: Flink > Issue Type: Bug > Components: Mesos, Scheduler > Affects Versions: 1.5.0, 1.5.1, 1.6.0 > Reporter: Renjie Liu > Assignee: Renjie Liu > Priority: Blocker > Fix For: 1.5.2, 1.6.0 > > > When deployed in mesos session cluster mode, the connector monitor keeps > reporting unable to connect to mesos after restart. In fact, scheduler driver > already connected to mesos master, but when the connected message is lost. > This is because leadership is not granted yet and fence id is not set, the > rpc service ignores the connected message. So we should connect to mesos > master after leadership is granted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)