Hi Jason, >From the stacktrace it seems that you are using the 1.4.0 client to list jobs on a 1.5.x cluster. This will not work. You have to use the 1.5.x client.
Best, Gary On Wed, Sep 5, 2018 at 5:35 PM, Jason Kania <jason.ka...@ymail.com> wrote: > Hello, > > Thanks for the response. I had already tried setting the log level to > debug in log4j-cli.properties, logback-console.xml, and > log4j-console.properties > but no additional relevant information comes out. On the server, all that > comes out are zookeeper ping responses: > > 2018-09-05 15:16:56,786 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for > sessionid: 0x3659b60bcb50076 after 1ms > > The client log indicates only the following (but we are not using hadoop): > > 2018-09-05 15:19:53,339 WARN org.apache.flink.client.cli.CliFrontend > - Could not load CLI class org.apache.flink.yarn.cli. > FlinkYarnSessionCli. > java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:264) > at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine( > CliFrontend.java:1208) > at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines( > CliFrontend.java:1164) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend. > java:1090) > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf. > Configuration > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 5 more > > > and > > 2018-09-05 15:19:53,881 ERROR org.apache.flink.shaded. > curator.org.apache.curator.ConnectionState - Authentication failed > > > despite the zookeeper being configured as 'open' and latest logs showing > data being read from zookeeper. > > 2018-09-05 15:19:54,274 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Reading reply > sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null > finished:false header:: 1,3 replyHeader:: 1,47244656277,0 request:: > '/flink,F response:: s{47244656196,47244656196, > 1536110417531,1536110417531,0,1,0,0,0,1,47244656197} > > > Much like the basic log output, the detailed trace shows no additional > information, just a gap after waiting for the response: > > 2018-09-05 15:19:54,313 INFO org.apache.flink.client.cli.CliFrontend > - Waiting for response... > 2018-09-05 15:20:07,635 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for > sessionid: 0x265a12437df0074 after 1ms > 2018-09-05 15:20:20,976 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for > sessionid: 0x265a12437df0074 after 1ms > 2018-09-05 15:20:24,311 INFO org.apache.flink.runtime.rest.RestClient > - Shutting down rest endpoint. > 2018-09-05 15:20:24,317 INFO org.apache.flink.runtime.rest.RestClient > - Rest endpoint shutdown complete. > 2018-09-05 15:20:24,318 INFO org.apache.flink.runtime.leaderretrieval. > ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService > /leader/rest_server_lock. > 2018-09-05 15:20:24,320 INFO org.apache.flink.runtime.leaderretrieval. > ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService > /leader/dispatcher_lock. > 2018-09-05 15:20:24,320 DEBUG org.apache.flink.shaded. > curator.org.apache.curator.framework.imps.CuratorFrameworkImpl - Closing > 2018-09-05 15:20:24,321 INFO org.apache.flink.shaded. > curator.org.apache.curator.framework.imps.CuratorFrameworkImpl - > backgroundOperationsLoop exiting > 2018-09-05 15:20:24,322 DEBUG org.apache.flink.shaded. > curator.org.apache.curator.CuratorZookeeperClient - Closing > 2018-09-05 15:20:24,322 DEBUG org.apache.flink.shaded. > curator.org.apache.curator.ConnectionState - Closing > 2018-09-05 15:20:24,323 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ZooKeeper - Closing session: > 0x265a12437df0074 > 2018-09-05 15:20:24,323 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Closing client for session: > 0x265a12437df0074 > 2018-09-05 15:20:24,329 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Reading reply > sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null > finished:false header:: 11,-11 replyHeader:: 11,47244656278,0 request:: > null response:: null > 2018-09-05 15:20:24,329 DEBUG org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - Disconnecting client for > session: 0x265a12437df0074 > 2018-09-05 15:20:24,330 INFO org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ZooKeeper - Session: 0x265a12437df0074 > closed > 2018-09-05 15:20:24,330 INFO org.apache.flink.shaded. > zookeeper.org.apache.zookeeper.ClientCnxn - EventThread shut down for > session: 0x265a12437df0074 > 2018-09-05 15:20:24,330 ERROR org.apache.flink.client.cli.CliFrontend > - Error while running the command. > > > > > On Wednesday, September 5, 2018, 3:41:29 a.m. EDT, Chesnay Schepler < > ches...@apache.org> wrote: > > > Please enable DEBUG logging for the client and TRACE logging for the > cluster. > > For the client, look for log messages starting with "Sending request of", > this will contain the host and port that requests are sent to by the > client. Verify that these are correct and match the host/port that you use > when accessing the web UI. > > For the server, look for log messages starting with "Received request", > so we can figure out whether the request at least arrives. > > On 05.09.2018 00:53, Jason Kania wrote: > > I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node cluster > configured with HA. Now I am encountering an issue where the flink command > line operations timeout. The exception generated is very poor because it > only indicates a timeout and not the reason or what it was trying to do: > > >./flink list -f > Waiting for response... > > ------------------------------------------------------------ > The program finished with the following exception: > org.apache.flink.util.FlinkException: Failed to retrieve job list. > at org.apache.flink.client.cli.CliFrontend.listJobs( > CliFrontend.java:433) > at org.apache.flink.client.cli.CliFrontend.lambda$list$0( > CliFrontend.java:416) > at org.apache.flink.client.cli.CliFrontend.runClusterAction( > CliFrontend.java:960) > at org.apache.flink.client.cli.CliFrontend.list(CliFrontend. > java:413) > at org.apache.flink.client.cli.CliFrontend.parseParameters( > CliFrontend.java:1028) > at org.apache.flink.client.cli.CliFrontend.lambda$main$9( > CliFrontend.java:1101) > at org.apache.flink.runtime.security.NoOpSecurityContext. > runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend. > java:1101) > Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: > Could not complete the operation. Exception is not retryable. > at org.apache.flink.runtime.concurrent.FutureUtils.lambda$ > retryOperationWithDelay$5(FutureUtils.java:213) > at java.util.concurrent.CompletableFuture.uniWhenComplete( > CompletableFuture.java:760) > at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire( > CompletableFuture.java:736) > at java.util.concurrent.CompletableFuture.postComplete( > CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.completeExceptionally( > CompletableFuture.java:1977) > at org.apache.flink.runtime.concurrent.FutureUtils$ > Timeout.run(FutureUtils.java:793) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.CompletionException: java.util.concurrent. > TimeoutException > at java.util.concurrent.CompletableFuture.encodeThrowable( > CompletableFuture.java:292) > at java.util.concurrent.CompletableFuture.completeThrowable( > CompletableFuture.java:308) > at java.util.concurrent.CompletableFuture.uniApply( > CompletableFuture.java:593) > at java.util.concurrent.CompletableFuture$UniApply. > tryFire(CompletableFuture.java:577) > ... 10 more > Caused by: java.util.concurrent.TimeoutException > > The web interface shows the 2 job managers and 3 task managers that are > talking with one another. > > I have looked at the zookeeper data and it is all present. > > I have tried running the command on multiple nodes and they all give the > same error. > > I looked for a verbose or debug option for the commands but found nothing. > > Suggestions on this? > > Thanks, > > Jason > > >