[ https://issues.apache.org/jira/browse/FLINK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735176#comment-15735176 ]
ASF GitHub Bot commented on FLINK-5114: --------------------------------------- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/2975#discussion_r91706223 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -503,15 +504,37 @@ class TaskManager( ) } - case PartitionState(taskExecutionId, taskResultId, partitionId, state) => - Option(runningTasks.get(taskExecutionId)) match { + // Updates the partition producer state + case PartitionProducerState(receiverExecutionId, result) => + Option(runningTasks.get(receiverExecutionId)) match { case Some(task) => - task.onPartitionStateUpdate(taskResultId, partitionId, state) + try { + result match { + case Left((intermediateDataSetId, resultPartitionId, producerState)) => + // Forward the state update to the task + task.onPartitionStateUpdate( + intermediateDataSetId, + resultPartitionId.getPartitionId, + producerState) + + case Right(failure) => + // Cancel or fail the execution + if (failure.isInstanceOf[PartitionProducerDisposedException]) { + log.debug("Partition producer disposed. Cancelling execution.", failure) --- End diff -- I think this log statement should be on `info` level. Otherwise, going through the logs in standard settings (only info level logging) leaves you wondering why the task is cancelled all of a sudden. > PartitionState update with finished execution fails > --------------------------------------------------- > > Key: FLINK-5114 > URL: https://issues.apache.org/jira/browse/FLINK-5114 > Project: Flink > Issue Type: Bug > Components: Network > Reporter: Ufuk Celebi > Assignee: Ufuk Celebi > > If a partition state request is triggered for a producer that finishes before > the request arrives, the execution is unregistered and the producer cannot be > found. In this case the PartitionState returns null and the job fails. > We need to check the producer location via the intermediate result partition > in this case. > See here: https://api.travis-ci.org/jobs/177668505/log.txt?deansi=true -- This message was sent by Atlassian JIRA (v6.3.4#6332)