[jira] [Comment Edited] (FLINK-9936) Mesos resource manager unable to connect to master after failover

Gary Yao (JIRA) Sun, 29 Jul 2018 05:14:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561093#comment-16561093
 ]


Gary Yao edited comment on FLINK-9936 at 7/29/18 12:13 PM:
-----------------------------------------------------------

[~liurenjie1024] What's the state of this ticket? Do you have solution in mind? 
I think we need to add a callback to the {{Runnable}} scheduled in 
{{ResourceManager#grantLeadership}}. What do you think?


was (Author: gjy):
[~liurenjie1024] What's the state of this ticket? Do you have solution in mind? 
I think we need to add a callback to the runnable scheduled in 
{{ResourceManager#grantLeadership}}. What do you think?

> Mesos resource manager unable to connect to master after failover
> -----------------------------------------------------------------
>
>                 Key: FLINK-9936
>                 URL: https://issues.apache.org/jira/browse/FLINK-9936
>             Project: Flink
>          Issue Type: Bug
>          Components: Mesos, Scheduler
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0
>            Reporter: Renjie Liu
>            Assignee: Renjie Liu
>            Priority: Blocker
>             Fix For: 1.5.2, 1.6.0
>
>
> When deployed in mesos session cluster mode, the connector monitor keeps 
> reporting unable to connect to mesos after restart. In fact, scheduler driver 
> already connected to mesos master, but when the connected message is lost. 
> This is because leadership is not granted yet and fence id is not set, the 
> rpc service ignores the connected message. So we should connect to mesos 
> master after leadership is granted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-9936) Mesos resource manager unable to connect to master after failover

Reply via email to