I forgot to add line numbers to the first link in my previous email:
https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh#L21-L25 On Fri, Mar 15, 2019 at 8:08 AM Gary Yao <g...@ververica.com> wrote: > Hi Harshith, > > In the jobmanager.sh script, the 2nd argument is assigned to the HOST > variable > [1]. How are you invoking jobmanager.sh? Prior to 1.5, the script expected > an > execution mode (local or cluster) but this is no longer the case [2]. > > Best, > Gary > > [1] > https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh > [2] > https://github.com/apache/flink/commit/d61664ca64bcb82c4e8ddf03a2ed38fe8edafa98 > > On Fri, Mar 15, 2019 at 3:36 AM Kumar Bolar, Harshith <hk...@arity.com> > wrote: > >> Hi Gary, >> >> >> >> An update. I noticed the line “–host cluster” in the program arguments >> section of the job manager logs. So, I commented the following section in >> jobmanager.sh, the task manager is now able to connect to job manager >> without issues. >> >> >> >> *if [ ! -z $HOST ]; then* >> >> * args+=("--host")* >> >> * args+=("${HOST}")* >> >> *fi* >> >> >> >> >> >> Task manager logs after commenting those lines: >> >> >> >> >> * 2019-03-14 22:31:02,863 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >> RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at >> akka://flink/user/taskmanager_0 .* >> >> *2019-03-14 22:31:02,875 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.* >> >> *2019-03-14 22:31:02,876 INFO >> org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job >> leader service.* >> >> *2019-03-14 22:31:02,877 INFO >> org.apache.flink.runtime.filecache.FileCache - User file >> cache uses directory >> /tmp/flink-dist-cache-12d5905f-d694-46f6-9359-3a636188b008* >> >> *2019-03-14 22:31:02,884 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting >> to ResourceManager >> akka.tcp://fl...@flink0-1.flink1.us-east-1.high.ue1.non.aws.cloud.arity.com:28945/user/resourcemanager(8583b335fd08a30a89585b7af07e4213) >> <http://fl...@flink0-1.flink1.us-east-1.high.ue1.non.aws.cloud.arity.com:28945/user/resourcemanager(8583b335fd08a30a89585b7af07e4213)>.* >> >> *2019-03-14 22:31:03,109 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Resolved >> ResourceManager address, beginning registration* >> >> *2019-03-14 22:31:03,110 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - >> Registration at ResourceManager attempt 1 (timeout=100ms)* >> >> *2019-03-14 22:31:03,228 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - >> Registration at ResourceManager attempt 2 (timeout=200ms)* >> >> *2019-03-14 22:31:03,266 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Successful >> registration at resource manager >> akka.tcp://fl...@flink0-1.flink1.us-east-1.abc.com:28945/user/resourcemanager >> <http://fl...@flink0-1.flink1.us-east-1.abc.com:28945/user/resourcemanager> >> under registration id 170ee6a00f80ee02ead0e88710093d77.* >> >> >> >> >> >> Thanks, >> >> Harshith >> >> >> >> *From: *Harshith Kumar Bolar <hk...@arity.com> >> *Date: *Friday, 15 March 2019 at 7:38 AM >> *To: *Gary Yao <g...@ververica.com> >> *Cc: *user <user@flink.apache.org> >> *Subject: *Re: [External] Re: Re: Flink 1.7.2: Task Manager not able to >> connect to Job Manager >> >> >> >> Hi Gary, >> >> >> >> Here are the full job manager and task manager logs. In the job manager >> logs, I see it says “*starting StandaloneSessionClusterEntrypoint”,* whereas >> in Flink 1.4.2, it used to say “*starting JobManager”*. Is this correct? >> >> >> >> Job manager logs: https://paste.ubuntu.com/p/DCVzsQdpHq/ >> (https://paste(.)ubuntu(.)com/p/DCVzsQdpHq >> /) >> >> Task Manager logs: https://paste.ubuntu.com/p/wbvYFZxdT8/ ( >> https://paste(.)ubuntu(.)com/p/wbvYFZxdT8/) >> >> >> >> Thanks, >> >> Harshith >> >> >> >> *From: *Gary Yao <g...@ververica.com> >> *Date: *Thursday, 14 March 2019 at 10:11 PM >> *To: *Harshith Kumar Bolar <hk...@arity.com> >> *Cc: *user <user@flink.apache.org> >> *Subject: *[External] Re: Re: Flink 1.7.2: Task Manager not able to >> connect to Job Manager >> >> >> >> Hi Harshith, >> >> The truncated log is not enough. Can you share the complete logs? If >> that's >> not possible, I'd like to see the beginning of the log files where the >> cluster >> configuration is logged. >> >> The TaskManager tries to connect to the leader that is advertised in >> ZooKeeper. In your case the "cluster" hostname is advertised which hints a >> problem in your Flink configuration. >> >> Best, >> Gary >> >> >> >> On Thu, Mar 14, 2019 at 4:54 PM Kumar Bolar, Harshith <hk...@arity.com> >> wrote: >> >> Hi Gary, >> >> >> >> I’ve attached the relevant portions of the JM and TM logs. >> >> >> >> *Job Manager Logs:* >> >> 2019-03-14 11:38:28,257 INFO >> org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager >> - State change: CONNECTED >> 2019-03-14 11:38:28,309 INFO >> org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined >> location of main cluster component log file: >> /opt/flink-1.7.2/log/flink-root-standalonesession-4-flink0-1.flink1.us-east-1.log >> 2019-03-14 11:38:28,309 INFO >> org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined >> location of main cluster component stdout file: >> /opt/flink-1.7.2/log/flink-root-standalonesession-4-flink0-1.flink1.us-east-1.out >> 2019-03-14 11:38:28,527 INFO >> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest >> endpoint listening at cluster:8080 >> 2019-03-14 11:38:28,527 INFO >> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - >> Starting ZooKeeperLeaderElectionService >> ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}. >> 2019-03-14 11:38:28,574 INFO >> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web >> frontend listening at http://cluster:8080 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cluster-3A8080&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=jmNlhpOrwRIDbMAqwetQxCtYFQfw1xtgw6S6ji1QqDE&e=> >> . >> 2019-03-14 11:38:28,613 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >> RPC endpoint for >> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at >> akka://flink/user/resourcemanager . >> 2019-03-14 11:38:28,674 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher >> at akka://flink/user/dispatcher . >> 2019-03-14 11:38:28,691 INFO >> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - >> Starting ZooKeeperLeaderElectionService >> ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}. >> 2019-03-14 11:38:28,694 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. >> 2019-03-14 11:38:28,698 INFO >> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - >> Starting ZooKeeperLeaderElectionService >> ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}. >> 2019-03-14 11:38:28,700 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. >> 2019-03-14 11:38:28,818 WARN >> akka.remote.ReliableDeliverySupervisor - Association >> with remote system [akka.tcp://flink@cluster:22671] has failed, address >> is now gated for [50] ms. Reason: [Association failed with >> [akka.tcp://flink@cluster:22671]] Caused by: [cluster] >> 2019-03-14 11:39:09,010 INFO >> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - >> http://cluster:8080 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cluster-3A8080&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=jmNlhpOrwRIDbMAqwetQxCtYFQfw1xtgw6S6ji1QqDE&e=> >> was granted leadership with >> leaderSessionID=bbe408fc-ef93-4328-abeb-85323db7aef7 >> 2019-03-14 11:39:09,010 INFO >> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - >> ResourceManager akka.tcp://flink@cluster:31794/user/resourcemanager was >> granted leadership with fencing token ae4c0d30d0d65a0c41565360667e48fb >> 2019-03-14 11:39:09,011 INFO >> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - >> Starting the SlotManager. >> 2019-03-14 11:39:09,012 INFO >> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher >> akka.tcp://flink@cluster:31794/user/dispatcher was granted leadership >> with fencing token c852ada2-5fd4-4ff8-80ab-c2cdd85a75d9 >> 2019-03-14 11:39:09,017 INFO >> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering >> all persisted jobs. >> >> *Task Manager Logs:* >> >> 2019-03-14 11:42:35,790 INFO >> org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager >> uses directory /tmp/flink-io-a7bc246d-bae4-489f-9c9c-f6a25d3c4b8f for spill >> files. >> 2019-03-14 11:42:35,820 INFO >> org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages >> have a max timeout of 10000 ms >> 2019-03-14 11:42:35,839 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting >> RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at >> akka://flink/user/taskmanager_0 . >> 2019-03-14 11:42:35,853 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. >> 2019-03-14 11:42:35,854 INFO >> org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job >> leader service. >> 2019-03-14 11:42:35,855 INFO >> org.apache.flink.runtime.filecache.FileCache - User file >> cache uses directory >> /tmp/flink-dist-cache-a7f67948-ab57-4cd9-b2a6-0361b53ecd26 >> 2019-03-14 11:42:35,871 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting >> to ResourceManager akka.tcp://flink@cluster >> :31794/user/resourcemanager(ae4c0d30d0d65a0c41565360667e48fb). >> 2019-03-14 11:42:35,963 WARN >> akka.remote.ReliableDeliverySupervisor - Association >> with remote system [akka.tcp://flink@cluster:31794] has failed, address >> is now gated for [50] ms. Reason: [Association failed with >> [akka.tcp://flink@cluster:31794]] Caused by: [cluster: Name or service >> not known] >> 2019-03-14 11:42:35,964 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >> resolve ResourceManager address >> akka.tcp://flink@cluster:31794/user/resourcemanager, >> retrying in 10000 ms: Could not connect to rpc endpoint under address >> akka.tcp://flink@cluster:31794/user/resourcemanager.. >> 2019-03-14 11:47:35,895 ERROR >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Fatal error >> occurred in TaskExecutor akka.tcp:// >> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=> >> . >> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: >> Could not register at the ResourceManager within the specified maximum >> registration duration 300000 ms. This indicates a problem with this >> instance. Terminating now. >> at >> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout( >> TaskExecutor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=> >> :1037) >> at >> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3( >> TaskExecutor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=> >> :1023) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :332) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :158) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :142) >> at >> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinTask.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=bv8jB1enKafGeoNgdOTLg2sbTtbMfgFehYs0GRLszts&e=> >> :260) >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=> >> :1339) >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=> >> :1979) >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( >> ForkJoinWorkerThread.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinWorkerThread.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=d_bm2VR2tTF2xi468xPlqDIiV2Bnq07S6kPGj6gOLN4&e=> >> :107) >> 2019-03-14 11:47:35,897 ERROR >> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error >> occurred while executing the TaskManager. Shutting it down... >> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: >> Could not register at the ResourceManager within the specified maximum >> registration duration 300000 ms. This indicates a problem with this >> instance. Terminating now. >> at >> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout( >> TaskExecutor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=> >> :1037) >> at >> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3( >> TaskExecutor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=> >> :1023) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :332) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :158) >> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive( >> AkkaRpcActor.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=> >> :142) >> at >> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinTask.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=bv8jB1enKafGeoNgdOTLg2sbTtbMfgFehYs0GRLszts&e=> >> :260) >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=> >> :1339) >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=> >> :1979) >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( >> ForkJoinWorkerThread.java >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinWorkerThread.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=d_bm2VR2tTF2xi468xPlqDIiV2Bnq07S6kPGj6gOLN4&e=> >> :107) >> 2019-03-14 11:47:35,904 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping >> TaskExecutor akka.tcp:// >> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=> >> . >> 2019-03-14 11:47:35,904 INFO >> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - >> Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. >> 2019-03-14 11:47:35,904 INFO >> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - >> Shutting down TaskExecutorLocalStateStoresManager. >> 2019-03-14 11:47:35,908 INFO >> org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager >> removed spill file directory >> /tmp/flink-io-a7bc246d-bae4-489f-9c9c-f6a25d3c4b8f >> 2019-03-14 11:47:35,908 INFO >> org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting >> down the network environment and its components. >> 2019-03-14 11:47:35,914 INFO >> org.apache.flink.runtime.io.network.netty.NettyClient - Successful >> shutdown (took 5 ms). >> 2019-03-14 11:47:35,917 INFO >> org.apache.flink.runtime.io.network.netty.NettyServer - Successful >> shutdown (took 2 ms). >> 2019-03-14 11:47:35,925 INFO >> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job >> leader service. >> 2019-03-14 11:47:35,931 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped >> TaskExecutor akka.tcp:// >> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0 >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=> >> . >> 2019-03-14 11:47:35,931 INFO >> org.apache.flink.runtime.blob.PermanentBlobCache - Shutting >> down BLOB cache >> 2019-03-14 11:47:35,933 INFO >> org.apache.flink.runtime.blob.TransientBlobCache - Shutting >> down BLOB cache >> 2019-03-14 11:47:35,943 INFO >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl >> - backgroundOperationsLoop exiting >> 2019-03-14 11:47:35,950 INFO >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - >> Session: 0x26977a24c4e0018 closed >> 2019-03-14 11:47:35,950 INFO >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x26977a24c4e0018 >> 2019-03-14 11:47:35,950 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping >> Akka RPC service. >> 2019-03-14 11:47:35,952 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting >> down remote daemon. >> 2019-03-14 11:47:35,952 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote >> daemon shut down; proceeding with flushing remote transports. >> 2019-03-14 11:47:35,959 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting >> down remote daemon. >> 2019-03-14 11:47:35,966 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote >> daemon shut down; proceeding with flushing remote transports. >> 2019-03-14 11:47:35,983 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting >> shut down. >> 2019-03-14 11:47:35,984 INFO >> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting >> shut down. >> 2019-03-14 11:47:35,992 INFO >> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped >> Akka RPC service. >> >> >> >> >> >> *From: *Gary Yao <g...@ververica.com> >> *Date: *Thursday, 14 March 2019 at 9:06 PM >> *To: *Harshith Kumar Bolar <hk...@arity.com> >> *Cc: *user <user@flink.apache.org> >> *Subject: *[External] Re: Flink 1.7.2: Task Manager not able to connect >> to Job Manager >> >> >> >> Hi Harshith, >> >> >> >> Can you share JM and TM logs? >> >> >> >> Best, >> >> Gary >> >> >> >> On Thu, Mar 14, 2019 at 3:42 PM Kumar Bolar, Harshith <hk...@arity.com> >> wrote: >> >> Hi all, >> >> >> >> I'm trying to upgrade our Flink cluster from 1.4.2 to 1.7.2 >> >> >> >> When I bring up the cluster, the task managers refuse to connect to the >> job managers with the following error. >> >> >> >> 2019-03-14 10:34:41,551 WARN >> akka.remote.ReliableDeliverySupervisor >> >> - Association with remote system [akka.tcp://flink@cluster:22671] >> has failed, address is now gated for [50] ms. Reason: [Association failed >> with [akka.tcp://flink@cluster:22671]] Caused by: [cluster: Name or >> service not known] >> >> >> >> Now, this works correctly if I add the following line into >> the /etc/hosts file. >> >> >> >> x.x.x.x job-manager-address.com >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__job-2Dmanager-2Daddress.com&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=04EWFpDL8G7AOCUH79K-QVwPa3NSJj7u4Qanpbrx0tg&s=KDu-Fxq2rWtLq1EmNp0DOuK0yWC6GyHwvhpbyQ8hRQg&e=> >> cluster >> >> >> >> Why is Flink 1.7.2 connecting to JM using cluster in the address? Flink >> 1.4.2 used to have the job manager's address instead of the word cluster. >> >> >> >> Thanks, >> >> Harshith >> >> >> >>