[ 
https://issues.apache.org/jira/browse/FLINK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735176#comment-15735176
 ] 

ASF GitHub Bot commented on FLINK-5114:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2975#discussion_r91706223
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
    @@ -503,15 +504,37 @@ class TaskManager(
                 )
               }
     
    -        case PartitionState(taskExecutionId, taskResultId, partitionId, 
state) =>
    -          Option(runningTasks.get(taskExecutionId)) match {
    +        // Updates the partition producer state
    +        case PartitionProducerState(receiverExecutionId, result) =>
    +          Option(runningTasks.get(receiverExecutionId)) match {
                 case Some(task) =>
    -              task.onPartitionStateUpdate(taskResultId, partitionId, state)
    +              try {
    +                result match {
    +                  case Left((intermediateDataSetId, resultPartitionId, 
producerState)) =>
    +                    // Forward the state update to the task
    +                    task.onPartitionStateUpdate(
    +                      intermediateDataSetId,
    +                      resultPartitionId.getPartitionId,
    +                      producerState)
    +
    +                  case Right(failure) =>
    +                    // Cancel or fail the execution
    +                    if 
(failure.isInstanceOf[PartitionProducerDisposedException]) {
    +                      log.debug("Partition producer disposed. Cancelling 
execution.", failure)
    --- End diff --
    
    I think this log statement should be on `info` level. Otherwise, going 
through the logs in standard settings (only info level logging) leaves you 
wondering why the task is cancelled all of a sudden.


> PartitionState update with finished execution fails
> ---------------------------------------------------
>
>                 Key: FLINK-5114
>                 URL: https://issues.apache.org/jira/browse/FLINK-5114
>             Project: Flink
>          Issue Type: Bug
>          Components: Network
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> If a partition state request is triggered for a producer that finishes before 
> the request arrives, the execution is unregistered and the producer cannot be 
> found. In this case the PartitionState returns null and the job fails.
> We need to check the producer location via the intermediate result partition 
> in this case.
> See here: https://api.travis-ci.org/jobs/177668505/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to