[ https://issues.apache.org/jira/browse/FLINK-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380048#comment-16380048 ]
ASF GitHub Bot commented on FLINK-8750: --------------------------------------- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/5588#discussion_r171183475 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGate.java --- @@ -553,6 +553,12 @@ public void requestPartitions() throws IOException, InterruptedException { channelsWithEndOfPartitionEvents.set(currentChannel.getChannelIndex()); if (channelsWithEndOfPartitionEvents.cardinality() == numberOfInputChannels) { + // Because of race condition between: + // 1. releasing inputChannelsWithData lock in this method and reaching this place + // 2. empty data notification that re-enqueues a channel + // we can end up with moreAvailable flag set to true, while we expect no more data. + checkState(!moreAvailable || !pollNextBufferOrEvent().isPresent()); + moreAvailable = false; --- End diff -- While this certainly fixes the `checkState(!bufferOrEvent.moreAvailable());` in the `UnionInputGate`, it does not improve the detection of additional data after the `EndOfPartitionEvent` too much. How about also adding `checkState(!pollNextBufferOrEvent().isPresent());` here: ``` private Optional<BufferOrEvent> getNextBufferOrEvent(boolean blocking) throws IOException, InterruptedException { if (hasReceivedAllEndOfPartitionEvents) { checkState(!pollNextBufferOrEvent().isPresent()); return Optional.empty(); } ``` In that case, if we ever try to get more data (due to a data notification) there should be no actual data left and only empty buffers. > InputGate may contain data after an EndOfPartitionEvent > ------------------------------------------------------- > > Key: FLINK-8750 > URL: https://issues.apache.org/jira/browse/FLINK-8750 > Project: Flink > Issue Type: Sub-task > Components: Network > Reporter: Nico Kruber > Assignee: Piotr Nowojski > Priority: Blocker > Fix For: 1.5.0 > > > The travis run at https://travis-ci.org/apache/flink/jobs/344425772 indicates > that there was still some data after an {{EndOfPartitionEvent}} or that > {{BufferOrEvent#moreAvailable}} contained the wrong value: > {code} > testOutputWithoutPk(org.apache.flink.table.runtime.stream.table.JoinITCase) > Time elapsed: 4.611 sec <<< ERROR! > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.IllegalStateException: null > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) > at > org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:173) > at > org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94) > at > org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292) > at > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)