Hi all, The question is being handled on the dev mailing list [1].
Best, Gary [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cancel-flink-job-occur-exception-td24056.html On Tue, Sep 4, 2018 at 2:21 PM, rileyli(李瑞亮) <rile...@tencent.com> wrote: > Hi all, > I submit a flink job through yarn-cluster mode and cancel job with > savepoint option immediately after job status change to deployed. > Sometimes i met this error: > > org.apache.flink.util.FlinkException: Could not cancel job xxxx. > at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4( > CliFrontend.java:585) > at org.apache.flink.client.cli.CliFrontend.runClusterAction( > CliFrontend.java:960) > at org.apache.flink.client.cli.CliFrontend.cancel( > CliFrontend.java:577) > at org.apache.flink.client.cli.CliFrontend.parseParameters( > CliFrontend.java:1034) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not > complete the operation. Number of retries has been exhausted. > at java.util.concurrent.CompletableFuture.reportGet( > CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get( > CompletableFuture.java:1895) > at org.apache.flink.client.program.rest.RestClusterClient. > cancelWithSavepoint(RestClusterClient.java:398) > at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4( > CliFrontend.java:583) > ... 6 more > Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: > Could not complete the operation. Number of retries has been exhausted. > at org.apache.flink.runtime.concurrent.FutureUtils.lambda$ > retryOperationWithDelay$5(FutureUtils.java:213) > at java.util.concurrent.CompletableFuture.uniWhenComplete( > CompletableFuture.java:760) > at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire( > CompletableFuture.java:736) > ... 1 more > Caused by: java.util.concurrent.CompletionException: > java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx > at java.util.concurrent.CompletableFuture.encodeThrowable( > CompletableFuture.java:292) > at java.util.concurrent.CompletableFuture.completeThrowable( > CompletableFuture.java:308) > at java.util.concurrent.CompletableFuture.uniCompose( > CompletableFuture.java:943) > at java.util.concurrent.CompletableFuture$UniCompose. > tryFire(CompletableFuture.java:926) > ... 16 more > Caused by: java.net.ConnectException: Connect refuse: > xxx/xxx.xxx.xxx.xxx:xxx > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect( > SocketChannelImpl.java:717) > at org.apache.flink.shaded.netty4.io.netty.channel. > socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) > at org.apache.flink.shaded.netty4.io.netty.channel.nio. > AbstractNioChannel$AbstractNioUnsafe.finishConnect( > AbstractNioChannel.java:281) > ... 7 more > > I check the jobmanager log, no error found. Savepoint is correct saved > in hdfs. Yarn appliction status changed to FINISHED and FinalStatus change > to KILLED. > I think this issue occur because RestClusterClient cannot find > jobmanager addresss after Jobmanager(AM) has shutdown. > My flink version is 1.5.3. > Anyone could help me to resolve this issue, thanks! > > Best Regard! >