[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work

ASF GitHub Bot (JIRA) Tue, 19 Jul 2016 05:09:59 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384026#comment-15384026
 ]


ASF GitHub Bot commented on FLINK-4152:
---------------------------------------

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2257#discussion_r71325133
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -405,36 +374,13 @@ class JobManager(
     
           currentResourceManager match {
             case Some(rm) =>
    -          val future = (rm ? decorateMessage(new 
RegisterResource(taskManager, msg)))(timeout)
    -          future.onComplete {
    -            case scala.util.Success(response) =>
    -              // the resource manager is available and answered
    -              self ! response
    -            case scala.util.Failure(t) =>
    -              t match {
    -                case _: TimeoutException =>
    -                  log.info("Attempt to register resource at 
ResourceManager timed out. Retrying")
    -                case _ =>
    -                  log.warn("Failure while asking ResourceManager for 
RegisterResource. Retrying", t)
    -              }
    -              // slow or unreachable resource manager, register anyway and 
let the rm reconnect
    -              self ! decorateMessage(new 
RegisterResourceSuccessful(taskManager, msg))
    -              self ! decorateMessage(new ReconnectResourceManager(rm))
    -          }(context.dispatcher)
    -
    +          log.info(s"Register task manager $resourceId at the resource 
manager.")
    +          rm ! decorateMessage(new RegisterResource(msg))
    --- End diff --
    
    I just don't understand why you remove this functionality. It was not 
broken in any way. Of course, we can always add features later (that is true 
for any component) but it changes the original RM design. If we want to add 
monitoring of the pool size later on, we will have to re-add the proper 
registration at the RM.


> TaskManager registration exponential backoff doesn't work
> ---------------------------------------------------------
>
>                 Key: FLINK-4152
>                 URL: https://issues.apache.org/jira/browse/FLINK-4152
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, TaskManager, YARN Client
>            Reporter: Robert Metzger
>            Assignee: Till Rohrmann
>         Attachments: logs.tgz
>
>
> While testing Flink 1.1 I've found that the TaskManagers are logging many 
> messages when registering at the JobManager.
> This is the log file: 
> https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294
> Its logging more than 3000 messages in less than a minute. I don't think that 
> this is the expected behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work

Reply via email to