Hi Abdul,

I've added Till and Gary to cc, who might be able to help you.

Best,

Dawid


On 11/10/18 03:05, Abdul Qadeer wrote:
>
> Hi,
>
>
> We are facing an issue in standalone HA mode in Flink 1.4.0 where
> Taskmanager restarts and is not able to register with the Jobmanager.
> It times out awaiting /AcknowledgeRegistration/AlreadyRegistered/
> message from Jobmanager Actor and keeps sending /RegisterTaskManager
> /message. The logs at Jobmanager don’t show anything about
> registration failure/request. It doesn’t print
> /log/.debug(*s"RegisterTaskManager: $*msg*"*) (from JobManager.scala)
> either. The network connection between taskmanager and jobmanager
> seems fine; tcpdump shows message sent to jobmanager and TCP ACK
> received from jobmanager. Note that the communication is happening
> between docker containers.
>
>
> Following are the logs from Taskmanager:
>
>
>
> {"timeMillis":1539189572438,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
> to register at JobManager
> akka.tcp://flink@192.168.83.51:6123/user/jobmanager
> <http://flink@192.168.83.51:6123/user/jobmanager> (attempt 1400,
> timeout: 30000
> milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}
>
> {"timeMillis":1539189580229,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
> ping response for sessionid: 0x10000260ea5002d after
> 0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}
>
> {"timeMillis":1539189600247,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
> ping response for sessionid: 0x10000260ea5002d after
> 0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}
>
> {"timeMillis":1539189602458,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
> to register at JobManager
> akka.tcp://flink@192.168.83.51:6123/user/jobmanager
> <http://flink@192.168.83.51:6123/user/jobmanager> (attempt 1401,
> timeout: 30000
> milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}
>
> {"timeMillis":1539189620251,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
> ping response for sessionid: 0x10000260ea5002d after
> 0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}
>
> {"timeMillis":1539189632478,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
> to register at JobManager
> akka.tcp://flink@192.168.83.51:6123/user/jobmanager
> <http://flink@192.168.83.51:6123/user/jobmanager> (attempt 1402,
> timeout: 30000
> milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}
>
>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to