Hi all,

The question is being handled on the dev mailing list [1].

Best,
Gary

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cancel-flink-job-occur-exception-td24056.html

On Tue, Sep 4, 2018 at 2:21 PM, rileyli(李瑞亮) <rile...@tencent.com> wrote:

> Hi all,
>       I submit a flink job through yarn-cluster mode and cancel job with
> savepoint option immediately after job status change to deployed.
> Sometimes i met this error:
>
> org.apache.flink.util.FlinkException: Could not cancel job xxxx.
>         at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(
> CliFrontend.java:585)
>         at org.apache.flink.client.cli.CliFrontend.runClusterAction(
> CliFrontend.java:960)
>         at org.apache.flink.client.cli.CliFrontend.cancel(
> CliFrontend.java:577)
>         at org.apache.flink.client.cli.CliFrontend.parseParameters(
> CliFrontend.java:1034)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
> complete the operation. Number of retries has been exhausted.
>         at java.util.concurrent.CompletableFuture.reportGet(
> CompletableFuture.java:357)
>         at java.util.concurrent.CompletableFuture.get(
> CompletableFuture.java:1895)
>         at org.apache.flink.client.program.rest.RestClusterClient.
> cancelWithSavepoint(RestClusterClient.java:398)
>         at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(
> CliFrontend.java:583)
>         ... 6 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
> Could not complete the operation. Number of retries has been exhausted.
>         at org.apache.flink.runtime.concurrent.FutureUtils.lambda$
> retryOperationWithDelay$5(FutureUtils.java:213)
>         at java.util.concurrent.CompletableFuture.uniWhenComplete(
> CompletableFuture.java:760)
>         at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(
> CompletableFuture.java:736)
>         ... 1 more
> Caused by: java.util.concurrent.CompletionException:
> java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
>         at java.util.concurrent.CompletableFuture.encodeThrowable(
> CompletableFuture.java:292)
>         at java.util.concurrent.CompletableFuture.completeThrowable(
> CompletableFuture.java:308)
>         at java.util.concurrent.CompletableFuture.uniCompose(
> CompletableFuture.java:943)
>         at java.util.concurrent.CompletableFuture$UniCompose.
> tryFire(CompletableFuture.java:926)
>         ... 16 more
> Caused by: java.net.ConnectException: Connect refuse:
> xxx/xxx.xxx.xxx.xxx:xxx
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:717)
>         at org.apache.flink.shaded.netty4.io.netty.channel.
> socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>         at org.apache.flink.shaded.netty4.io.netty.channel.nio.
> AbstractNioChannel$AbstractNioUnsafe.finishConnect(
> AbstractNioChannel.java:281)
>         ... 7 more
>
>     I check the jobmanager log, no error found. Savepoint is correct saved
> in hdfs. Yarn appliction status changed to FINISHED and FinalStatus change
> to KILLED.
>     I think this issue occur because RestClusterClient cannot find
> jobmanager addresss after Jobmanager(AM) has shutdown.
>     My flink version is 1.5.3.
>     Anyone could help me to resolve this issue, thanks!
>
> Best Regard!
>

Reply via email to