[ 
https://issues.apache.org/jira/browse/FLINK-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380048#comment-16380048
 ] 

ASF GitHub Bot commented on FLINK-8750:
---------------------------------------

Github user NicoK commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5588#discussion_r171183475
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGate.java
 ---
    @@ -553,6 +553,12 @@ public void requestPartitions() throws IOException, 
InterruptedException {
                                
channelsWithEndOfPartitionEvents.set(currentChannel.getChannelIndex());
     
                                if 
(channelsWithEndOfPartitionEvents.cardinality() == numberOfInputChannels) {
    +                                   // Because of race condition between:
    +                                   // 1. releasing inputChannelsWithData 
lock in this method and reaching this place
    +                                   // 2. empty data notification that 
re-enqueues a channel
    +                                   // we can end up with moreAvailable 
flag set to true, while we expect no more data.
    +                                   checkState(!moreAvailable || 
!pollNextBufferOrEvent().isPresent());
    +                                   moreAvailable = false;
    --- End diff --
    
    While this certainly fixes the 
`checkState(!bufferOrEvent.moreAvailable());` in the `UnionInputGate`, it does 
not improve the detection of additional data after the `EndOfPartitionEvent` 
too much. How about also adding 
`checkState(!pollNextBufferOrEvent().isPresent());` here:
    ```
        private Optional<BufferOrEvent> getNextBufferOrEvent(boolean blocking) 
throws IOException, InterruptedException {
                if (hasReceivedAllEndOfPartitionEvents) {
                        checkState(!pollNextBufferOrEvent().isPresent());
                        return Optional.empty();
                }
    ```
    In that case, if we ever try to get more data (due to a data notification) 
there should be no actual data left and only empty buffers.


> InputGate may contain data after an EndOfPartitionEvent
> -------------------------------------------------------
>
>                 Key: FLINK-8750
>                 URL: https://issues.apache.org/jira/browse/FLINK-8750
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>            Reporter: Nico Kruber
>            Assignee: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> The travis run at https://travis-ci.org/apache/flink/jobs/344425772 indicates 
> that there was still some data after an {{EndOfPartitionEvent}} or that 
> {{BufferOrEvent#moreAvailable}} contained the wrong value:
> {code}
> testOutputWithoutPk(org.apache.flink.table.runtime.stream.table.JoinITCase)  
> Time elapsed: 4.611 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: null
>       at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>       at 
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:173)
>       at 
> org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94)
>       at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292)
>       at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to