I forgot to add line numbers to the first link in my previous email:

https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh#L21-L25

On Fri, Mar 15, 2019 at 8:08 AM Gary Yao <g...@ververica.com> wrote:

> Hi Harshith,
>
> In the jobmanager.sh script, the 2nd argument is assigned to the HOST
> variable
> [1]. How are you invoking jobmanager.sh? Prior to 1.5, the script expected
> an
> execution mode (local or cluster) but this is no longer the case [2].
>
> Best,
> Gary
>
> [1]
> https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh
> [2]
> https://github.com/apache/flink/commit/d61664ca64bcb82c4e8ddf03a2ed38fe8edafa98
>
> On Fri, Mar 15, 2019 at 3:36 AM Kumar Bolar, Harshith <hk...@arity.com>
> wrote:
>
>> Hi Gary,
>>
>>
>>
>> An update. I noticed the line “–host cluster” in the program arguments
>> section of the job manager logs. So, I commented the following section in
>> jobmanager.sh, the task manager is now able to connect to job manager
>> without issues.
>>
>>
>>
>>   *if [ ! -z $HOST ]; then*
>>
>> *        args+=("--host")*
>>
>> *        args+=("${HOST}")*
>>
>> *fi*
>>
>>
>>
>>
>>
>> Task manager logs after commenting those lines:
>>
>>
>>
>>
>> * 2019-03-14 22:31:02,863 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at
>> akka://flink/user/taskmanager_0 .*
>>
>> *2019-03-14 22:31:02,875 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.*
>>
>> *2019-03-14 22:31:02,876 INFO
>> org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job
>> leader service.*
>>
>> *2019-03-14 22:31:02,877 INFO
>> org.apache.flink.runtime.filecache.FileCache                  - User file
>> cache uses directory
>> /tmp/flink-dist-cache-12d5905f-d694-46f6-9359-3a636188b008*
>>
>> *2019-03-14 22:31:02,884 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>> to ResourceManager
>> akka.tcp://fl...@flink0-1.flink1.us-east-1.high.ue1.non.aws.cloud.arity.com:28945/user/resourcemanager(8583b335fd08a30a89585b7af07e4213)
>> <http://fl...@flink0-1.flink1.us-east-1.high.ue1.non.aws.cloud.arity.com:28945/user/resourcemanager(8583b335fd08a30a89585b7af07e4213)>.*
>>
>> *2019-03-14 22:31:03,109 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Resolved
>> ResourceManager address, beginning registration*
>>
>> *2019-03-14 22:31:03,110 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            -
>> Registration at ResourceManager attempt 1 (timeout=100ms)*
>>
>> *2019-03-14 22:31:03,228 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            -
>> Registration at ResourceManager attempt 2 (timeout=200ms)*
>>
>> *2019-03-14 22:31:03,266 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Successful
>> registration at resource manager
>> akka.tcp://fl...@flink0-1.flink1.us-east-1.abc.com:28945/user/resourcemanager
>> <http://fl...@flink0-1.flink1.us-east-1.abc.com:28945/user/resourcemanager>
>> under registration id 170ee6a00f80ee02ead0e88710093d77.*
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Harshith
>>
>>
>>
>> *From: *Harshith Kumar Bolar <hk...@arity.com>
>> *Date: *Friday, 15 March 2019 at 7:38 AM
>> *To: *Gary Yao <g...@ververica.com>
>> *Cc: *user <user@flink.apache.org>
>> *Subject: *Re: [External] Re: Re: Flink 1.7.2: Task Manager not able to
>> connect to Job Manager
>>
>>
>>
>> Hi Gary,
>>
>>
>>
>> Here are the full job manager and task manager logs. In the job manager
>> logs, I see it says “*starting StandaloneSessionClusterEntrypoint”,* whereas
>> in Flink 1.4.2, it used to say “*starting JobManager”*. Is this correct?
>>
>>
>>
>> Job manager logs: https://paste.ubuntu.com/p/DCVzsQdpHq/ 
>> (https://paste(.)ubuntu(.)com/p/DCVzsQdpHq
>> /)
>>
>> Task Manager logs: https://paste.ubuntu.com/p/wbvYFZxdT8/ (
>> https://paste(.)ubuntu(.)com/p/wbvYFZxdT8/)
>>
>>
>>
>> Thanks,
>>
>> Harshith
>>
>>
>>
>> *From: *Gary Yao <g...@ververica.com>
>> *Date: *Thursday, 14 March 2019 at 10:11 PM
>> *To: *Harshith Kumar Bolar <hk...@arity.com>
>> *Cc: *user <user@flink.apache.org>
>> *Subject: *[External] Re: Re: Flink 1.7.2: Task Manager not able to
>> connect to Job Manager
>>
>>
>>
>> Hi Harshith,
>>
>> The truncated log is not enough. Can you share the complete logs? If
>> that's
>> not possible, I'd like to see the beginning of the log files where the
>> cluster
>> configuration is logged.
>>
>> The TaskManager tries to connect to the leader that is advertised in
>> ZooKeeper. In your case the "cluster" hostname is advertised which hints a
>> problem in your Flink configuration.
>>
>> Best,
>> Gary
>>
>>
>>
>> On Thu, Mar 14, 2019 at 4:54 PM Kumar Bolar, Harshith <hk...@arity.com>
>> wrote:
>>
>> Hi Gary,
>>
>>
>>
>> I’ve attached the relevant portions of the JM and TM logs.
>>
>>
>>
>> *Job Manager Logs:*
>>
>> 2019-03-14 11:38:28,257 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager
>> - State change: CONNECTED
>> 2019-03-14 11:38:28,309 INFO
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
>> location of main cluster component log file:
>> /opt/flink-1.7.2/log/flink-root-standalonesession-4-flink0-1.flink1.us-east-1.log
>> 2019-03-14 11:38:28,309 INFO
>> org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined
>> location of main cluster component stdout file:
>> /opt/flink-1.7.2/log/flink-root-standalonesession-4-flink0-1.flink1.us-east-1.out
>> 2019-03-14 11:38:28,527 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest
>> endpoint listening at cluster:8080
>> 2019-03-14 11:38:28,527 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
>> 2019-03-14 11:38:28,574 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web
>> frontend listening at http://cluster:8080
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cluster-3A8080&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=jmNlhpOrwRIDbMAqwetQxCtYFQfw1xtgw6S6ji1QqDE&e=>
>> .
>> 2019-03-14 11:38:28,613 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>> akka://flink/user/resourcemanager .
>> 2019-03-14 11:38:28,674 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>> at akka://flink/user/dispatcher .
>> 2019-03-14 11:38:28,691 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
>> 2019-03-14 11:38:28,694 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2019-03-14 11:38:28,698 INFO
>> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
>> Starting ZooKeeperLeaderElectionService
>> ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
>> 2019-03-14 11:38:28,700 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>> 2019-03-14 11:38:28,818 WARN
>> akka.remote.ReliableDeliverySupervisor                        - Association
>> with remote system [akka.tcp://flink@cluster:22671] has failed, address
>> is now gated for [50] ms. Reason: [Association failed with
>> [akka.tcp://flink@cluster:22671]] Caused by: [cluster]
>> 2019-03-14 11:39:09,010 INFO
>> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    -
>> http://cluster:8080
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cluster-3A8080&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=jmNlhpOrwRIDbMAqwetQxCtYFQfw1xtgw6S6ji1QqDE&e=>
>> was granted leadership with
>> leaderSessionID=bbe408fc-ef93-4328-abeb-85323db7aef7
>> 2019-03-14 11:39:09,010 INFO
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> ResourceManager akka.tcp://flink@cluster:31794/user/resourcemanager was
>> granted leadership with fencing token ae4c0d30d0d65a0c41565360667e48fb
>> 2019-03-14 11:39:09,011 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  -
>> Starting the SlotManager.
>> 2019-03-14 11:39:09,012 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>> akka.tcp://flink@cluster:31794/user/dispatcher was granted leadership
>> with fencing token c852ada2-5fd4-4ff8-80ab-c2cdd85a75d9
>> 2019-03-14 11:39:09,017 INFO
>> org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering
>> all persisted jobs.
>>
>> *Task Manager Logs:*
>>
>> 2019-03-14 11:42:35,790 INFO
>> org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager
>> uses directory /tmp/flink-io-a7bc246d-bae4-489f-9c9c-f6a25d3c4b8f for spill
>> files.
>> 2019-03-14 11:42:35,820 INFO
>> org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages
>> have a max timeout of 10000 ms
>> 2019-03-14 11:42:35,839 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at
>> akka://flink/user/taskmanager_0 .
>> 2019-03-14 11:42:35,853 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2019-03-14 11:42:35,854 INFO
>> org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job
>> leader service.
>> 2019-03-14 11:42:35,855 INFO
>> org.apache.flink.runtime.filecache.FileCache                  - User file
>> cache uses directory
>> /tmp/flink-dist-cache-a7f67948-ab57-4cd9-b2a6-0361b53ecd26
>> 2019-03-14 11:42:35,871 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>> to ResourceManager akka.tcp://flink@cluster
>> :31794/user/resourcemanager(ae4c0d30d0d65a0c41565360667e48fb).
>> 2019-03-14 11:42:35,963 WARN
>> akka.remote.ReliableDeliverySupervisor                        - Association
>> with remote system [akka.tcp://flink@cluster:31794] has failed, address
>> is now gated for [50] ms. Reason: [Association failed with
>> [akka.tcp://flink@cluster:31794]] Caused by: [cluster: Name or service
>> not known]
>> 2019-03-14 11:42:35,964 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>> resolve ResourceManager address 
>> akka.tcp://flink@cluster:31794/user/resourcemanager,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@cluster:31794/user/resourcemanager..
>> 2019-03-14 11:47:35,895 ERROR
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Fatal error
>> occurred in TaskExecutor akka.tcp://
>> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=>
>> .
>> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
>> Could not register at the ResourceManager within the specified maximum
>> registration duration 300000 ms. This indicates a problem with this
>> instance. Terminating now.
>>    at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(
>> TaskExecutor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=>
>> :1037)
>>    at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(
>> TaskExecutor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=>
>> :1023)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :332)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :158)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :142)
>>    at
>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>    at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>    at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinTask.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=bv8jB1enKafGeoNgdOTLg2sbTtbMfgFehYs0GRLszts&e=>
>> :260)
>>    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>> ForkJoinPool.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=>
>> :1339)
>>    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=>
>> :1979)
>>    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>> ForkJoinWorkerThread.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinWorkerThread.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=d_bm2VR2tTF2xi468xPlqDIiV2Bnq07S6kPGj6gOLN4&e=>
>> :107)
>> 2019-03-14 11:47:35,897 ERROR
>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Fatal error
>> occurred while executing the TaskManager. Shutting it down...
>> org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException:
>> Could not register at the ResourceManager within the specified maximum
>> registration duration 300000 ms. This indicates a problem with this
>> instance. Terminating now.
>>    at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(
>> TaskExecutor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=>
>> :1037)
>>    at
>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$3(
>> TaskExecutor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__TaskExecutor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=wdm3q_iJnu8L9xmD8hreg638d7pxSet6twA4ggwlDIY&e=>
>> :1023)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :332)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :158)
>>    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(
>> AkkaRpcActor.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__AkkaRpcActor.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=uQw7PD53jnoGsG_qcfATfHUWMAPCjhjKqyYBjvYy7iY&e=>
>> :142)
>>    at
>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
>>    at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>>    at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>>    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>>    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>>    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>>    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>>    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>>    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinTask.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=bv8jB1enKafGeoNgdOTLg2sbTtbMfgFehYs0GRLszts&e=>
>> :260)
>>    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>> ForkJoinPool.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=>
>> :1339)
>>    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinPool.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=8tFyqgZpCdRLwcHpdKe3mYfJ2F8ZgSQzMvW59LoO9S4&e=>
>> :1979)
>>    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>> ForkJoinWorkerThread.java
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ForkJoinWorkerThread.java&d=DwQFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=d_bm2VR2tTF2xi468xPlqDIiV2Bnq07S6kPGj6gOLN4&e=>
>> :107)
>> 2019-03-14 11:47:35,904 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Stopping
>> TaskExecutor akka.tcp://
>> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=>
>> .
>> 2019-03-14 11:47:35,904 INFO
>> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
>> Stopping ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
>> 2019-03-14 11:47:35,904 INFO
>> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager  -
>> Shutting down TaskExecutorLocalStateStoresManager.
>> 2019-03-14 11:47:35,908 INFO
>> org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager
>> removed spill file directory
>> /tmp/flink-io-a7bc246d-bae4-489f-9c9c-f6a25d3c4b8f
>> 2019-03-14 11:47:35,908 INFO
>> org.apache.flink.runtime.io.network.NetworkEnvironment        - Shutting
>> down the network environment and its components.
>> 2019-03-14 11:47:35,914 INFO
>> org.apache.flink.runtime.io.network.netty.NettyClient         - Successful
>> shutdown (took 5 ms).
>> 2019-03-14 11:47:35,917 INFO
>> org.apache.flink.runtime.io.network.netty.NettyServer         - Successful
>> shutdown (took 2 ms).
>> 2019-03-14 11:47:35,925 INFO
>> org.apache.flink.runtime.taskexecutor.JobLeaderService        - Stop job
>> leader service.
>> 2019-03-14 11:47:35,931 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Stopped
>> TaskExecutor akka.tcp://
>> fl...@flink1-1.flink1.us-east-1.com:24623/user/taskmanager_0
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__flink-40flink1-2D1.flink1.us-2Deast-2D1.com-3A24623_user_taskmanager-5F0&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=8UFr4YnRs-5evbGW--p28mCAv00uGlqHKnYoYchCXb8&s=GiUzkLjbXMJFr7rhd_zh-C1BpqSfOF-A7KItP0jILFE&e=>
>> .
>> 2019-03-14 11:47:35,931 INFO
>> org.apache.flink.runtime.blob.PermanentBlobCache              - Shutting
>> down BLOB cache
>> 2019-03-14 11:47:35,933 INFO
>> org.apache.flink.runtime.blob.TransientBlobCache              - Shutting
>> down BLOB cache
>> 2019-03-14 11:47:35,943 INFO
>> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
>> - backgroundOperationsLoop exiting
>> 2019-03-14 11:47:35,950 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  -
>> Session: 0x26977a24c4e0018 closed
>> 2019-03-14 11:47:35,950 INFO
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  -
>> EventThread shut down for session: 0x26977a24c4e0018
>> 2019-03-14 11:47:35,950 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopping
>> Akka RPC service.
>> 2019-03-14 11:47:35,952 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting
>> down remote daemon.
>> 2019-03-14 11:47:35,952 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote
>> daemon shut down; proceeding with flushing remote transports.
>> 2019-03-14 11:47:35,959 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting
>> down remote daemon.
>> 2019-03-14 11:47:35,966 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote
>> daemon shut down; proceeding with flushing remote transports.
>> 2019-03-14 11:47:35,983 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting
>> shut down.
>> 2019-03-14 11:47:35,984 INFO
>> akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting
>> shut down.
>> 2019-03-14 11:47:35,992 INFO
>> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped
>> Akka RPC service.
>>
>>
>>
>>
>>
>> *From: *Gary Yao <g...@ververica.com>
>> *Date: *Thursday, 14 March 2019 at 9:06 PM
>> *To: *Harshith Kumar Bolar <hk...@arity.com>
>> *Cc: *user <user@flink.apache.org>
>> *Subject: *[External] Re: Flink 1.7.2: Task Manager not able to connect
>> to Job Manager
>>
>>
>>
>> Hi Harshith,
>>
>>
>>
>> Can you share JM and TM logs?
>>
>>
>>
>> Best,
>>
>> Gary
>>
>>
>>
>> On Thu, Mar 14, 2019 at 3:42 PM Kumar Bolar, Harshith <hk...@arity.com>
>> wrote:
>>
>> Hi all,
>>
>>
>>
>> I'm trying to upgrade our Flink cluster from 1.4.2 to 1.7.2
>>
>>
>>
>> When I bring up the cluster, the task managers refuse to connect to the
>> job managers with the following error.
>>
>>
>>
>>         2019-03-14 10:34:41,551 WARN
>> akka.remote.ReliableDeliverySupervisor
>>
>>         - Association with remote system [akka.tcp://flink@cluster:22671]
>> has failed, address is now gated for [50] ms. Reason: [Association failed
>> with [akka.tcp://flink@cluster:22671]] Caused by: [cluster: Name or
>> service not known]
>>
>>
>>
>> Now, this works correctly if I add the following line into
>> the /etc/hosts file.
>>
>>
>>
>>         x.x.x.x job-manager-address.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__job-2Dmanager-2Daddress.com&d=DwMFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=61bFb6zUNKZxlAQDRo_jKA&m=04EWFpDL8G7AOCUH79K-QVwPa3NSJj7u4Qanpbrx0tg&s=KDu-Fxq2rWtLq1EmNp0DOuK0yWC6GyHwvhpbyQ8hRQg&e=>
>> cluster
>>
>>
>>
>> Why is Flink 1.7.2 connecting to JM using cluster in the address? Flink
>> 1.4.2 used to have the job manager's address instead of the word cluster.
>>
>>
>>
>> Thanks,
>>
>> Harshith
>>
>>
>>
>>

Reply via email to