Ufuk Celebi created FLINK-1636: ---------------------------------- Summary: Misleading exception during concurrent partition release and remote request Key: FLINK-1636 URL: https://issues.apache.org/jira/browse/FLINK-1636 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Ufuk Celebi Priority: Minor
When a result partition is released concurrently with a remote partition request, the request might come in late and result in an exception at the receiving task saying: {code} 16:04:22,499 INFO org.apache.flink.runtime.taskmanager.Task - CHAIN Partition -> Map (Map at testRestartMultipleTimes(SimpleRecoveryITCase.java:200)) (1/4) switched to FAILED : java.io.IOException: org.apache.flink.runtime.io.network.partition.queue.IllegalQueueIteratorRequestException at remote input channel: Intermediate result partition has already been released.]. at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.checkIoError(RemoteInputChannel.java:223) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.getNextBuffer(RemoteInputChannel.java:103) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:310) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:75) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59) at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:91) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)