[jira] [Closed] (FLINK-11708) There is a big chance that a taskExecutor starts another RegistrationTimeout after successfully registered at RM

Till Rohrmann (JIRA) Thu, 21 Feb 2019 07:49:31 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Till Rohrmann closed FLINK-11708.
---------------------------------
    Resolution: Duplicate

> There is a big chance that a taskExecutor starts another RegistrationTimeout 
> after successfully registered at RM
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11708
>                 URL: https://issues.apache.org/jira/browse/FLINK-11708
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.6.3, 1.7.2
>            Reporter: Kai Chen
>            Priority: Major
>         Attachments: taskmanager.log
>
>
> Currently *TaskExecutor.start()* method starts by connecting to the 
> ResourceManager and ends with a *startRegistrationTimeout()* method. However, 
> the *startRegistrationTimeout()* is probably called after the connection to 
> ResourceManager is established. Then in time of 
> "taskmanager.registration.timeout"*, registrationTimeout()* is called and 
> throw a RegistrationTimeoutException bellow, which causes the TaskExecutor 
> shutdown:
> 2019-02-19 18:00:32,825 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error 
> occurred while executing the TaskManager. Shutting it down...
>  
> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
>  Could not register at the ResourceManager within the specified maximum 
> registration duration 300000 ms. This indicates a problem with this instance. 
> Terminating now.
>  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1037)
>  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(TaskExecutor.java:1023)
>  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$36/321628857.run(Unknown
>  Source)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
>  at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>  at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>  at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>  at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>  at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>  at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-11708) There is a big chance that a taskExecutor starts another RegistrationTimeout after successfully registered at RM

Reply via email to