[ https://issues.apache.org/jira/browse/FLINK-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann closed FLINK-11708. --------------------------------- Resolution: Duplicate > There is a big chance that a taskExecutor starts another RegistrationTimeout > after successfully registered at RM > ---------------------------------------------------------------------------------------------------------------- > > Key: FLINK-11708 > URL: https://issues.apache.org/jira/browse/FLINK-11708 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Affects Versions: 1.6.3, 1.7.2 > Reporter: Kai Chen > Priority: Major > Attachments: taskmanager.log > > > Currently *TaskExecutor.start()* method starts by connecting to the > ResourceManager and ends with a *startRegistrationTimeout()* method. However, > the *startRegistrationTimeout()* is probably called after the connection to > ResourceManager is established. Then in time of > "taskmanager.registration.timeout"*, registrationTimeout()* is called and > throw a RegistrationTimeoutException bellow, which causes the TaskExecutor > shutdown: > 2019-02-19 18:00:32,825 ERROR > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error > occurred while executing the TaskManager. Shutting it down... > > org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: > Could not register at the ResourceManager within the specified maximum > registration duration 300000 ms. This indicates a problem with this instance. > Terminating now. > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1037) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1023) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$36/321628857.run(Unknown > Source) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v7.6.3#76005)