[ https://issues.apache.org/jira/browse/FLINK-19791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231456#comment-17231456 ]
Robert Metzger commented on FLINK-19791: ---------------------------------------- I'm not sure if this problem has been really fixed. While testing the RC 1 of Flink 1.12.0, I saw the following exception: {code} 2020-11-13 14:39:15,566 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Co-Flat Map (1/4) (0602ab4f0306596872a928c6375bd153) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@4102bd05. org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException: Connection for partition be51d31b9b1185e636f8b0e964615117#1@96cf744116e8d64d20ca53ccedac43c3 not reachable. at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:163) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:314) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:286) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:94) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:283) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:184) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:577) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:541) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222] Caused by: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager '/192.168.1.25:57359' has failed. This might indicate that the remote task manager has been lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:95) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160) ~[flink-dist_2.11-1.12.0.jar:1.12.0] ... 12 more Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager '/192.168.1.25:57359' has failed. This might indicate that the remote task manager has been lost. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_222] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_222] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:88) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160) ~[flink-dist_2.11-1.12.0.jar:1.12.0] ... 12 more Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connecting to remote task manager '/192.168.1.25:57359' has failed. This might indicate that the remote task manager has been lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:134) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160) ~[flink-dist_2.11-1.12.0.jar:1.12.0] ... 12 more Caused by: java.lang.NullPointerException at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:61) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient.<init>(NettyPartitionRequestClient.java:73) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:126) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67) ~[flink-dist_2.11-1.12.0.jar:1.12.0] at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160) ~[flink-dist_2.11-1.12.0.jar:1.12.0] ... 12 more {code} > PartitionRequestClientFactoryTest.testInterruptsNotCached fails with > NullPointerException > ----------------------------------------------------------------------------------------- > > Key: FLINK-19791 > URL: https://issues.apache.org/jira/browse/FLINK-19791 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.12.0 > Reporter: Robert Metzger > Assignee: Roman Khachatryan > Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.12.0 > > > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8517&view=logs&j=6e58d712-c5cc-52fb-0895-6ff7bd56c46b&t=f30a8e80-b2cf-535c-9952-7f521a4ae374 > {code} > 2020-10-23T13:25:12.0774554Z [ERROR] > testInterruptsNotCached(org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest) > Time elapsed: 0.762 s <<< ERROR! > 2020-10-23T13:25:12.0775695Z java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. > This might indicate that the remote task manager has been lost. > 2020-10-23T13:25:12.0776455Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:95) > 2020-10-23T13:25:12.0777038Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest.testInterruptsNotCached(PartitionRequestClientFactoryTest.java:72) > 2020-10-23T13:25:12.0777465Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-10-23T13:25:12.0777815Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-10-23T13:25:12.0778221Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-10-23T13:25:12.0778581Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-10-23T13:25:12.0778921Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > 2020-10-23T13:25:12.0779331Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-10-23T13:25:12.0779733Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > 2020-10-23T13:25:12.0780117Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-10-23T13:25:12.0780484Z at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > 2020-10-23T13:25:12.0780851Z at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > 2020-10-23T13:25:12.0781236Z at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > 2020-10-23T13:25:12.0781600Z at > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > 2020-10-23T13:25:12.0781937Z at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > 2020-10-23T13:25:12.0782431Z at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > 2020-10-23T13:25:12.0782877Z at > org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > 2020-10-23T13:25:12.0783223Z at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > 2020-10-23T13:25:12.0783541Z at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > 2020-10-23T13:25:12.0783905Z at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > 2020-10-23T13:25:12.0784315Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > 2020-10-23T13:25:12.0784718Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > 2020-10-23T13:25:12.0785125Z at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > 2020-10-23T13:25:12.0785552Z at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > 2020-10-23T13:25:12.0785980Z at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > 2020-10-23T13:25:12.0786379Z at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > 2020-10-23T13:25:12.0786763Z at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > 2020-10-23T13:25:12.0787922Z Caused by: > java.util.concurrent.ExecutionException: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. > This might indicate that the remote task manager has been lost. > 2020-10-23T13:25:12.0788575Z at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > 2020-10-23T13:25:12.0788954Z at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > 2020-10-23T13:25:12.0789431Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:88) > 2020-10-23T13:25:12.0789808Z ... 26 more > 2020-10-23T13:25:12.0790546Z Caused by: > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. > This might indicate that the remote task manager has been lost. > 2020-10-23T13:25:12.0791396Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:134) > 2020-10-23T13:25:12.0791959Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111) > 2020-10-23T13:25:12.0792732Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77) > 2020-10-23T13:25:12.0793118Z ... 26 more > 2020-10-23T13:25:12.0793342Z Caused by: java.lang.NullPointerException > 2020-10-23T13:25:12.0793681Z at > org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:61) > 2020-10-23T13:25:12.0794319Z at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient.<init>(NettyPartitionRequestClient.java:73) > 2020-10-23T13:25:12.0794854Z at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:126) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)