Please enable DEBUG logging for the client and TRACE logging for the
cluster.
For the client, look for log messages starting with "Sending request
of", this will contain the host and port that requests are sent to by
the client. Verify that these are correct and match the host/port that
you use when accessing the web UI.
For the server, look for log messages starting with "Received request",
so we can figure out whether the request at least arrives.
On 05.09.2018 00:53, Jason Kania wrote:
I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node
cluster configured with HA. Now I am encountering an issue where the
flink command line operations timeout. The exception generated is very
poor because it only indicates a timeout and not the reason or what it
was trying to do:
>./flink list -f
Waiting for response...
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.util.FlinkException: Failed to retrieve job list.
at
org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:433)
at
org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:416)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at
org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:413)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1028)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by:
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could
not complete the operation. Exception is not retryable.
at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
... 10 more
Caused by: java.util.concurrent.TimeoutException
The web interface shows the 2 job managers and 3 task managers that
are talking with one another.
I have looked at the zookeeper data and it is all present.
I have tried running the command on multiple nodes and they all give
the same error.
I looked for a verbose or debug option for the commands but found nothing.
Suggestions on this?
Thanks,
Jason