Re: Cancel flink job occur exception

Till Rohrmann Fri, 07 Sep 2018 08:45:35 -0700

Hi Devin,

could you send the logs of the cluster entrypoint and the client once you
see this exception? This will help to debug the problem.


Cheers,
Till

On Tue, Sep 4, 2018 at 2:12 PM devinduan(段丁瑞) <[email protected]> wrote:

> Hi all,
>       I submit a flink job through yarn-cluster mode and cancel job with
> savepoint option immediately after job status change to deployed. Sometimes
> i met this error:
>
> org.apache.flink.util.FlinkException: Could not cancel job xxxx.
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:585)
>         at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
>         at
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:577)
>         at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1034)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
> complete the operation. Number of retries has been exhausted.
>         at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>         at
> org.apache.flink.client.program.rest.RestClusterClient.cancelWithSavepoint(RestClusterClient.java:398)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:583)
>         ... 6 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
> Could not complete the operation. Number of retries has been exhausted.
>         at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
>         at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>         at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>         ... 1 more
> Caused by: java.util.concurrent.CompletionException:
> java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
>         at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>         at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>         ... 16 more
> Caused by: java.net.ConnectException: Connect refuse:
> xxx/xxx.xxx.xxx.xxx:xxx
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
>         at
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
>         ... 7 more
>
>     I check the jobmanager log, no error found. Savepoint is correct saved
> in hdfs. Yarn appliction status changed to FINISHED and FinalStatus change
> to KILLED.
>     I think this issue occur because RestClusterClient cannot find
> jobmanager addresss after Jobmanager(AM) has shutdown.
>     My flink version is 1.5.3.
>     Anyone could help me to resolve this issue, thanks!
>
> devin.
>
>

Re: Cancel flink job occur exception

Reply via email to