Hey, DId You try to run any other job on your setup? Also, could You please tell what are the sources you are trying to use, do all messages come from Kafka?? >From the first look, it seems that the JobManager can't connect to one of the TaskManagers.
Best Regards, Dom. pon., 12 lis 2018 o 17:12 zavalit <[email protected]> napisał(a): > Hi, > may be i just missing smth, but i just have no more ideas where to look. > > here is an screen of the failed state > < > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1383/Bildschirmfoto_2018-11-12_um_16.png> > > > i read messages from 2 sources, make a join based on a common key and sink > it all in a kafka. > > val env = StreamExecutionEnvironment.getExecutionEnvironment > env.setParallelism(3) > ... > source1 > .keyBy(_.searchId) > .connect(source2.keyBy(_.searchId)) > .process(new SearchResultsJoinFunction) > .addSink(KafkaSink.sink) > > so it perfectly works when launch it locally. when i deploy it to 1 job > manager and 3 taskmanagers and get every Task in "RUNNING" state, after 2 > minutes (when nothing is comming to sink) one of the taskmanagers gets > following in log: > > Flat Map (1/3) (9598c11996f4b52a2e2f9f532f91ff66) switched from RUNNING to > FAILED. > java.io.IOException: Connecting the channel failed: Connecting to remote > task manager + 'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed. > This might indicate that the remote task manager has been lost. > at > org.apache.flink.runtime.io > .network.netty.PartitionRequestClientFactory$ConnectingChannel.waitForChannel(PartitionRequestClientFactory.java:196) > at > org.apache.flink.runtime.io > .network.netty.PartitionRequestClientFactory$ConnectingChannel.access$000(PartitionRequestClientFactory.java:133) > at > org.apache.flink.runtime.io > .network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:85) > at > org.apache.flink.runtime.io > .network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:60) > at > org.apache.flink.runtime.io > .network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:166) > at > org.apache.flink.runtime.io > .network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:494) > at > org.apache.flink.runtime.io > .network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:525) > at > org.apache.flink.runtime.io > .network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508) > at > > org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94) > at > org.apache.flink.streaming.runtime.io > .StreamInputProcessor.processInput(StreamInputProcessor.java:209) > at > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.flink.runtime.io > .network.netty.exception.RemoteTransportException: > Connecting to remote task manager + > 'flink-taskmanager-11-dn9cj/10.81.27.84:37708' has failed. This might > indicate that the remote task manager has been lost. > at > org.apache.flink.runtime.io > .network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:219) > at > org.apache.flink.runtime.io > .network.netty.PartitionRequestClientFactory$ConnectingChannel.operationComplete(PartitionRequestClientFactory.java:133) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) > at > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:269) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:125) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) > at > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > ... 1 more > Caused by: > org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: > connection timed out: flink-taskmanager-11-dn9cj/10.81.27.84:37708 > at > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) > ... 7 more > 2018-11-12 15:47:57,198 INFO org.apache.flink.runtime.taskmanager.Task > > - Flat Map (1/3) (171e84d98f94a83e1f3e7cd598c7dbbc) switched from RUNNING > to > FAILED. > > > i would appreciate any hint. > > thx a lot. > > > > > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
