Hi all, We run Flink on a five node cluster – three task managers, two job managers. One of the job manager running on flink2-0 node is down and refuses to come back up, so the cluster is currently running with a single job manager. When I restart the service, I see this in the logs. Any idea what this issue might be?
2018-10-22 06:43:50,458 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor 2018-10-22 06:43:50,462 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-73e8dbe2-8fdb-4310-84d4-c9f3445723f3 2018-10-22 06:43:50,466 INFO org.apache.flink.runtime.blob.BlobServer - Enabling ssl for the blob server 2018-10-22 06:43:50,482 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:36880 - max concurrent requests: 50 - max backlog: 1000 2018-10-22 06:43:50,501 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive 2018-10-22 06:43:50,525 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2018-10-22 06:43:50,525 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager at akka.ssl.tcp://fl...@flink2-0.flink2.us-east-1.prod.xxxxxxx.io:22902/user/jobmanager. 2018-10-22 06:43:50,526 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@2805f48f. 2018-10-22 06:43:50,532 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 2018-10-22 06:43:50,557 INFO org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager - Received leader address but not running in leader ActorSystem. Cancelling registration. Thanks, Harshith