[ 
https://issues.apache.org/jira/browse/FLINK-19791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231456#comment-17231456
 ] 

Robert Metzger commented on FLINK-19791:
----------------------------------------

I'm not sure if this problem has been really fixed. While testing the RC 1 of 
Flink 1.12.0, I saw the following exception:

{code}
2020-11-13 14:39:15,566 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat Map 
(1/4) (0602ab4f0306596872a928c6375bd153) switched from RUNNING to FAILED on 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@4102bd05.
org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
 Connection for partition 
be51d31b9b1185e636f8b0e964615117#1@96cf744116e8d64d20ca53ccedac43c3 not 
reachable.
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:163)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:314)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:286)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:94)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:283)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:184)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:577)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:541) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222]
Caused by: java.io.IOException: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '/192.168.1.25:57359' has failed. This might 
indicate that the remote task manager has been lost.
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:95)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 12 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '/192.168.1.25:57359' has failed. This might 
indicate that the remote task manager has been lost.
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
~[?:1.8.0_222]
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) 
~[?:1.8.0_222]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:88)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 12 more
Caused by: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '/192.168.1.25:57359' has failed. This might 
indicate that the remote task manager has been lost.
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:134)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 12 more
Caused by: java.lang.NullPointerException
        at 
org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:61) 
~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient.<init>(NettyPartitionRequestClient.java:73)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:126)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:67)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:160)
 ~[flink-dist_2.11-1.12.0.jar:1.12.0]
        ... 12 more
{code}

> PartitionRequestClientFactoryTest.testInterruptsNotCached fails with 
> NullPointerException
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-19791
>                 URL: https://issues.apache.org/jira/browse/FLINK-19791
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.12.0
>            Reporter: Robert Metzger
>            Assignee: Roman Khachatryan
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8517&view=logs&j=6e58d712-c5cc-52fb-0895-6ff7bd56c46b&t=f30a8e80-b2cf-535c-9952-7f521a4ae374
> {code}
> 2020-10-23T13:25:12.0774554Z [ERROR] 
> testInterruptsNotCached(org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest)
>   Time elapsed: 0.762 s  <<< ERROR!
> 2020-10-23T13:25:12.0775695Z java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. 
> This might indicate that the remote task manager has been lost.
> 2020-10-23T13:25:12.0776455Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:95)
> 2020-10-23T13:25:12.0777038Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactoryTest.testInterruptsNotCached(PartitionRequestClientFactoryTest.java:72)
> 2020-10-23T13:25:12.0777465Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-10-23T13:25:12.0777815Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-10-23T13:25:12.0778221Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-10-23T13:25:12.0778581Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-10-23T13:25:12.0778921Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-10-23T13:25:12.0779331Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-10-23T13:25:12.0779733Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-10-23T13:25:12.0780117Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-10-23T13:25:12.0780484Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2020-10-23T13:25:12.0780851Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2020-10-23T13:25:12.0781236Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2020-10-23T13:25:12.0781600Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2020-10-23T13:25:12.0781937Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2020-10-23T13:25:12.0782431Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2020-10-23T13:25:12.0782877Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2020-10-23T13:25:12.0783223Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2020-10-23T13:25:12.0783541Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-10-23T13:25:12.0783905Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-10-23T13:25:12.0784315Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-10-23T13:25:12.0784718Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-10-23T13:25:12.0785125Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-10-23T13:25:12.0785552Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-10-23T13:25:12.0785980Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-10-23T13:25:12.0786379Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-10-23T13:25:12.0786763Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2020-10-23T13:25:12.0787922Z Caused by: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. 
> This might indicate that the remote task manager has been lost.
> 2020-10-23T13:25:12.0788575Z  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> 2020-10-23T13:25:12.0788954Z  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2020-10-23T13:25:12.0789431Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:88)
> 2020-10-23T13:25:12.0789808Z  ... 26 more
> 2020-10-23T13:25:12.0790546Z Caused by: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '934dfa03c743/172.18.0.2:8080' has failed. 
> This might indicate that the remote task manager has been lost.
> 2020-10-23T13:25:12.0791396Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:134)
> 2020-10-23T13:25:12.0791959Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:111)
> 2020-10-23T13:25:12.0792732Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:77)
> 2020-10-23T13:25:12.0793118Z  ... 26 more
> 2020-10-23T13:25:12.0793342Z Caused by: java.lang.NullPointerException
> 2020-10-23T13:25:12.0793681Z  at 
> org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:61)
> 2020-10-23T13:25:12.0794319Z  at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient.<init>(NettyPartitionRequestClient.java:73)
> 2020-10-23T13:25:12.0794854Z  at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:126)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to