I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node cluster
configured with HA. Now I am encountering an issue where the flink command line
operations timeout. The exception generated is very poor because it only
indicates a timeout and not the reason or what it was trying to do:
>./flink list -fWaiting for response...
------------------------------------------------------------ The program
finished with the following exception:org.apache.flink.util.FlinkException:
Failed to retrieve job list. at
org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:433)
at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:416)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:413)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1028)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)Caused by:
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
complete the operation. Exception is not retryable. at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
... 10 moreCaused by: java.util.concurrent.TimeoutException
The web interface shows the 2 job managers and 3 task managers that are talking
with one another.
I have looked at the zookeeper data and it is all present.
I have tried running the command on multiple nodes and they all give the same
error.
I looked for a verbose or debug option for the commands but found nothing.
Suggestions on this?
Thanks,
Jason