Hello,
Thanks for the response. I had already tried setting the log level to debug in 
log4j-cli.properties, logback-console.xml, and log4j-console.properties but no 
additional relevant information comes out. On the server, all that comes out 
are zookeeper ping responses:
2018-09-05 15:16:56,786 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping 
response for sessionid: 0x3659b60bcb50076 after 1ms

The client log indicates only the following (but we are not using hadoop):
2018-09-05 15:19:53,339 WARN  org.apache.flink.client.cli.CliFrontend           
            - Could not load CLI class 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.java.lang.NoClassDefFoundError: 
org/apache/hadoop/conf/Configuration
        at java.lang.Class.forName0(Native Method)        at 
java.lang.Class.forName(Class.java:264)        at 
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1208)
        at 
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1164)
        at 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1090)Caused by: 
java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration        
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)        at 
java.lang.ClassLoader.loadClass(ClassLoader.java:424)        at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)        at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357)        ... 5 more

and 
2018-09-05 15:19:53,881 ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed

despite the zookeeper being configured as 'open' and latest logs showing data 
being read from zookeeper.
2018-09-05 15:19:54,274 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Reading 
reply sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null 
finished:false header:: 1,3  replyHeader:: 1,47244656277,0  request:: '/flink,F 
 response:: 
s{47244656196,47244656196,1536110417531,1536110417531,0,1,0,0,0,1,47244656197}

Much like the basic log output, the detailed trace shows no additional 
information, just a gap after waiting for the response:
2018-09-05 15:19:54,313 INFO  org.apache.flink.client.cli.CliFrontend           
            - Waiting for response...2018-09-05 15:20:07,635 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping 
response for sessionid: 0x265a12437df0074 after 1ms2018-09-05 15:20:20,976 
DEBUG org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got 
ping response for sessionid: 0x265a12437df0074 after 1ms2018-09-05 15:20:24,311 
INFO  org.apache.flink.runtime.rest.RestClient                      - Shutting 
down rest endpoint.2018-09-05 15:20:24,317 INFO  
org.apache.flink.runtime.rest.RestClient                      - Rest endpoint 
shutdown complete.2018-09-05 15:20:24,318 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Stopping ZooKeeperLeaderRetrievalService /leader/rest_server_lock.2018-09-05 
15:20:24,320 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Stopping ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.2018-09-05 
15:20:24,320 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
  - Closing2018-09-05 15:20:24,321 INFO  
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
  - backgroundOperationsLoop exiting2018-09-05 15:20:24,322 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient  - 
Closing2018-09-05 15:20:24,322 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Closing2018-09-05 15:20:24,323 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Closing 
session: 0x265a12437df00742018-09-05 15:20:24,323 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Closing 
client for session: 0x265a12437df00742018-09-05 15:20:24,329 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Reading 
reply sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null 
finished:false header:: 11,-11  replyHeader:: 11,47244656278,0  request:: null 
response:: null2018-09-05 15:20:24,329 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - 
Disconnecting client for session: 0x265a12437df00742018-09-05 15:20:24,330 INFO 
 org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Session: 
0x265a12437df0074 closed2018-09-05 15:20:24,330 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - 
EventThread shut down for session: 0x265a12437df00742018-09-05 15:20:24,330 
ERROR org.apache.flink.client.cli.CliFrontend                       - Error 
while running the command.



   On Wednesday, September 5, 2018, 3:41:29 a.m. EDT, Chesnay Schepler 
<ches...@apache.org> wrote:  
 
  Please enable DEBUG logging for the client and TRACE logging for the cluster.
 
 For the client, look for log messages starting with "Sending request of", this 
will contain the host and port that requests are sent to by the client. Verify 
that these are correct and match the host/port that you use when accessing the 
web UI.
 
 For the server, look for log messages starting with "Received request", so we 
can figure out whether the request at least arrives.
 
 On 05.09.2018 00:53, Jason Kania wrote:
  
  I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node cluster 
configured with HA. Now I am encountering an issue where the flink command line 
operations timeout. The exception generated is very poor because it only 
indicates a timeout and not the reason or what it was trying to do: 
  >./flink list -f  Waiting for response... 
  ------------------------------------------------------------  The program 
finished with the following exception:   org.apache.flink.util.FlinkException: 
Failed to retrieve job list.         at 
org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:433)         
atorg.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:416)   
      
atorg.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
         at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:413)  
       
atorg.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1028)
         
atorg.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)  
       
atorg.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101) 
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
Could not complete the operation. Exception is not retryable.         
atorg.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
         
atjava.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
         
atjava.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
         
atjava.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
         
atjava.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
         
atorg.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         
at java.util.concurrent.FutureTask.run(FutureTask.java:266)         
atjava.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
         
atjava.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
         
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         at java.lang.Thread.run(Thread.java:748) Caused by: 
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException 
        
atjava.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
         
atjava.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
         
atjava.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)   
      
atjava.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
         ... 10 more Caused by: java.util.concurrent.TimeoutException 
  The web interface shows the 2 job managers and 3 task managers that are 
talking with one another. 
  I have looked at the zookeeper data and it is all present. 
  I have tried running the command on multiple nodes and they all give the same 
error. 
  I looked for a verbose or debug option for the commands but found nothing. 
  Suggestions on this? 
  Thanks, 
  Jason   
 

 
   

Reply via email to