[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382493#comment-15382493 ]
ASF GitHub Bot commented on FLINK-4152: --------------------------------------- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2257#discussion_r71174483 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -405,36 +374,13 @@ class JobManager( currentResourceManager match { case Some(rm) => - val future = (rm ? decorateMessage(new RegisterResource(taskManager, msg)))(timeout) - future.onComplete { - case scala.util.Success(response) => - // the resource manager is available and answered - self ! response - case scala.util.Failure(t) => - t match { - case _: TimeoutException => - log.info("Attempt to register resource at ResourceManager timed out. Retrying") - case _ => - log.warn("Failure while asking ResourceManager for RegisterResource. Retrying", t) - } - // slow or unreachable resource manager, register anyway and let the rm reconnect - self ! decorateMessage(new RegisterResourceSuccessful(taskManager, msg)) - self ! decorateMessage(new ReconnectResourceManager(rm)) - }(context.dispatcher) - + log.info(s"Register task manager $resourceId at the resource manager.") + rm ! decorateMessage(new RegisterResource(msg)) --- End diff -- If containers die, then the ResourceManager will always be notified by Yarn and is able to pass this information to the JobManager. The advantage of ensuring that this message gets delivered upon TaskManager registration is that the ResourceManager can actually guarantee resources. On the other hand, if messages can be lost, the ResourceManager is just a tool to say "give me more", "give me less" with no actual guarantees how much you will get. > TaskManager registration exponential backoff doesn't work > --------------------------------------------------------- > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client > Reporter: Robert Metzger > Assignee: Till Rohrmann > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)