[ https://issues.apache.org/jira/browse/FLINK-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Metzger updated FLINK-5553: ---------------------------------- Attachment: application-1484132267957-0076 > Job fails during deployment with IllegalStateException from subpartition > request > -------------------------------------------------------------------------------- > > Key: FLINK-5553 > URL: https://issues.apache.org/jira/browse/FLINK-5553 > Project: Flink > Issue Type: Bug > Components: Network > Affects Versions: 1.3.0 > Reporter: Robert Metzger > Attachments: application-1484132267957-0076 > > > While running a test job with Flink 1.3-SNAPSHOT > (6fb6967b9f9a31f034bd09fcf76aaf147bc8e9a0) the job failed with this exception: > {code} > 2017-01-18 14:56:27,043 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed > (9/10) (befc06d0e792c2ce39dde74b365dd3cf) switched from DEPLOYING to RUNNING. > 2017-01-18 14:56:27,059 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map > (9/10) (e94a01ec283e5dce7f79b02cf51654c4) switched from DEPLOYING to RUNNING. > 2017-01-18 14:56:27,817 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map > (10/10) (cbb61c9a2f72c282877eb383e111f7cd) switched from RUNNING to FAILED. > java.lang.IllegalStateException: There has been an error in the channel. > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) > at java.lang.Thread.run(Thread.java:745) > 2017-01-18 14:56:27,819 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Job > Misbehaved Job (b1d985d11984df57400fdff2bb656c59) switched from state RUNNING > to FAILING. > java.lang.IllegalStateException: There has been an error in the channel. > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.addInputChannel(PartitionRequestClientHandler.java:77) > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClient.requestSubpartition(PartitionRequestClient.java:104) > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:115) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:419) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:441) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:153) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:192) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:270) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:666) > at java.lang.Thread.run(Thread.java:745) > {code} > This is the first exception that is reported to the jobmanager. > I think this is related to missing network buffers. You see that from the > next deployment after the restart, where the deployment fails with the > insufficient number of buffers exception. > I'll add logs to the JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)