Re: TaskManager not connecting to ResourceManager in HA mode

Zili Chen Wed, 21 Aug 2019 19:18:44 -0700

Besides, would you like to participant our survey thread[1] on
user list about "How do you use high-availability services in Flink?"


It would help Flink improve its high-availability serving.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E


Zili Chen <wander4...@gmail.com> 于2019年8月22日周四 上午10:16写道：

> Hi Aleksandar,
>
> base on your log:
>
> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
> to ResourceManager
> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
> .
> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
> Could not connect to rpc endpoint under address
> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>
> it looks like you return a jobmanager address on retrieval service of
> resource manager. Please check the implementation carefully or share it on
> mailing list that others can help for investigation.
>
> Best,
> tison.
>
>
> Zhu Zhu <reed...@gmail.com> 于2019年8月22日周四 上午10:11写道：
>
>> Hi Aleksandar,
>>
>> The resource manager address is retrieved from the HA services.
>> Would you check whether your customized HA services is returning the
>> right  LeaderRetrievalService and whether the LeaderRetrievalService is
>> really retrieving the right leader's address?
>> Or is it possible that the stored resource manager address in HA is
>> replaced by jobmanager address in any case?
>>
>> Thanks,
>> Zhu Zhu
>>
>> Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月22日周四
>> 上午8:16写道：
>>
>>> Hi all,
>>>
>>> I’m experimenting with using my own implementation of HA services
>>> instead of ZooKeeper that would persist JobManager information on a
>>> Kubernetes volume instead of in ZooKeeper.
>>>
>>> I’ve set the high-availability option in flink-conf.yaml to the FQN of
>>> my factory class, and started the docker ensemble as I usually do (i.e.
>>> with no special “cluster” arguments or scripts.)
>>>
>>> What’s happening now is that TaskManager is unable to connect to
>>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>>> path instead of /user/resourcemanager.
>>>
>>> Here’s what I found in the logs:
>>>
>>>
>>> jobmanager_1    | 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>>                                      - Remoting started; listening on
>>> addresses :[akka.tcp://flink@jobmanager:6123]
>>> jobmanager_1    | 2019-08-22 00:05:00,975 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor
>>> system started at akka.tcp://flink@jobmanager:6123
>>>
>>> jobmanager_1    | 2019-08-22 00:05:02,380 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>>> RPC endpoint for
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>>> akka://flink/user/resourcemanager .
>>>
>>> jobmanager_1    | 2019-08-22 00:05:03,138 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>>> at akka://flink/user/dispatcher .
>>>
>>> jobmanager_1    | 2019-08-22 00:05:03,211 INFO
>>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>>> was granted leadership with fencing token 00000000000000000000000000000000
>>>
>>> jobmanager_1    | 2019-08-22 00:05:03,292 INFO
>>>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted leadership
>>> with fencing token 00000000-0000-0000-0000-000000000000
>>>
>>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>>> to ResourceManager
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
>>> .
>>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>>> resolve ResourceManager address
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
>>> Could not connect to rpc endpoint under address
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>>
>>> Is this a known bug? I’d appreciate any help I can get.
>>>
>>> Thanks,
>>> Aleksandar Mastilovic
>>>
>>

Re: TaskManager not connecting to ResourceManager in HA mode

Reply via email to