[ 
https://issues.apache.org/jira/browse/FLINK-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549139#comment-16549139
 ] 

ASF GitHub Bot commented on FLINK-9838:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/6373

    [FLINK-9838][logging] Don't log slot request failures on the ResourceManager

    ## What is the purpose of the change
    
    Decrease log cluttering by not logging slot request failures on the 
`ResourceManager`.
    
    ## Verifying this change
    
    - Verified manually
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink 
fixSlotAllocationFailureLogging

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6373
    
----
commit f02f0231ba1fe047086fdf90227bf4dac9697d87
Author: Till Rohrmann <trohrmann@...>
Date:   2018-07-19T11:07:44Z

    [FLINK-9838][logging] Don't log slot request failures on the ResourceManager

----


> Slot request failed Exceptions after completing a job
> -----------------------------------------------------
>
>                 Key: FLINK-9838
>                 URL: https://issues.apache.org/jira/browse/FLINK-9838
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.1, 1.6.0
>            Reporter: Nico Kruber
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, after a job finished, e.g. the following one, several exceptions 
> are logged (at INFO level) about failed slot requests although the job has 
> run successfully.
> {code}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromElements(1, 2, 3, 4).print();
> env.execute();
> {code}
> {code}
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Closing 
> the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Suspending the SlotManager.
> 16:28:16,106 INFO  
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - 
> Unregister TaskManager aa20e76adb9aee0cdadc50dbc06ea208 from the SlotManager.
> 16:28:16,107 INFO  
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Slot 
> request with allocation id f99ff6d66f7bc618a9ee6e9470e0cdb1 for job 
> 1bdaafd1072e210790790b99e7741b6a failed.
> org.apache.flink.util.FlinkException: The assigned slot 
> b21f8807-5d0a-4e53-9e55-b6522b4a41c0_0 was removed.
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.suspend(SlotManager.java:234)
>       at 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.close(SlotManager.java:251)
>       at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.postStop(ResourceManager.java:224)
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.postStop(AkkaRpcActor.java:105)
>       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.postStop(FencedAkkaRpcActor.java:40)
>       at akka.actor.Actor$class.aroundPostStop(Actor.scala:515)
>       at akka.actor.UntypedActor.aroundPostStop(UntypedActor.scala:95)
>       at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>       at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>       at akka.actor.ActorCell.terminate(ActorCell.scala:374)
>       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:467)
>       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
>       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:260)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 16:28:16,109 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor         
>    - Stopping TaskExecutor akka://flink/user/taskmanager_0.
> 16:28:16,110 INFO  
> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager  - 
> Shutting down TaskExecutorLocalStateStoresManager.
> 16:28:16,109 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>    - Stopping dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> 16:28:16,111 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher   
>    - Stopping all currently running jobs of dispatcher 
> akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to